• Partnership for Advanced Computing in Europe (PRACE)

Managing Big Data with R and Hadoop

Learn how to manage and analyse big data using the R programming language and Hadoop programming framework.

13,224 enrolled on this course

Managing Big Data with R and Hadoop
  • Duration5 weeks
  • Weekly study4 hours

This online course will introduce you to various high performance computing (HPC) facilities for big data analysis. This includes R – a programming language renowned for its simplicity, elegance and community support – and Hadoop – an open source, Java-based programming framework for large data sets.

You will find out how to use them, avoiding common pitfalls and saving you time and money.

What topics will you cover?

  • First steps in R and RStudio
  • Working with Apache Hadoop 1 – Fundamentals
  • Working with Apache Hadoop 2 – RHadoop
  • Statistical learning using RHadoop

What will you achieve?

By the end of the course, you will:

  • Understand how the performance of modern supercomputing is achieved
  • Understand the basic functionality of the Bash terminal window
  • Understand the basic functionality of Apache Hadoop for scalable, distributed computing
  • Understand the basic functionality of RHadoop
  • Understand the basic problems of supervised and unsupervised learning
  • Perform basic clustering, regression and classification with RHadoop.
Download video: standard or HD

What topics will you cover?

  • Welcome to BIG DATA
  • Working with Hadoop
  • First steps in R and RHadoop
  • Statistical learning with RHadoop: clustering
  • Statistical learning with RHadoop: regression and classification

When would you like to start?

Start straight away and learn at your own pace. If the course hasn’t started yet you’ll see the future date listed below.

  • Available now

What will you achieve?

By the end of the course, you‘ll be able to...

  • Explore basic functionality of Apache Hadoop and of RHadoop
  • Experiment how to achieve performance of modern supercomputing
  • Experiment regression and classification with RHadoop;
  • Demonstrate basic clustering, regression and classification with RHadoop;
  • Investigate basic functionality of Bash terminal window

Who is the course for?

This course is designed for people interested in data science, computational statistics and machine learning. It will also be useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand HPC.

Who will you learn with?

I am an active researcher in mathematical optimization, which has many applications in data science and where HPC is an inevitable tool.

Biljsna Mileva Boshkoska is an assistant professor in computer science. Her interests include decision support systems, data mining and working with big data.

Leon Kos is a 25+ years veteran of using Linux desktop on a daily basis to build digital relationships for research, teaching, and getting the job done by programming.

Who developed the course?

Partnership for Advanced Computing in Europe (PRACE)

The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels.

Learning on FutureLearn

Your learning, your rules

  • Courses are split into weeks, activities, and steps, but you can complete them as quickly or slowly as you like
  • Learn through a mix of bite-sized videos, long- and short-form articles, audio, and practical activities
  • Stay motivated by using the Progress page to keep track of your step completion and assessment scores

Join a global classroom

  • Experience the power of social learning, and get inspired by an international network of learners
  • Share ideas with your peers and course educators on every step of the course
  • Join the conversation by reading, @ing, liking, bookmarking, and replying to comments from others

Map your progress

  • As you work through the course, use notifications and the Progress page to guide your learning
  • Whenever you’re ready, mark each step as complete, you’re in control
  • Complete 90% of course steps and all of the assessments to earn your certificate

Want to know more about learning on FutureLearn? Using FutureLearn

Learner reviews

Learner reviews cannot be loaded due to your cookie settings. Please enable all cookies and refresh the page to view this content.

Do you know someone who'd love this course? Tell them about it...

You can use the hashtag #FLMassiveData to talk about this course on social media.