• Partnership for Advanced Computing in Europe (PRACE)

Managing Big Data with R and Hadoop

Learn how to manage and analyse big data using the R programming language and Hadoop programming framework.

Managing Big Data with R and Hadoop
  • Duration5 weeks
  • Weekly study4 hours

Why join the course?

This online course will introduce you to various high performance computing (HPC) facilities for big data analysis. This includes R – a programming language renowned for its simplicity, elegance and community support – and Hadoop – an open source, Java-based programming framework for large data sets.

You will find out how to use them, avoiding common pitfalls and saving you time and money.

What topics will you cover?

  • First steps in R and RStudio
  • Working with Apache Hadoop 1 – Fundamentals
  • Working with Apache Hadoop 2 – RHadoop
  • Statistical learning using RHadoop

What will you achieve?

By the end of the course, you will:

  • Understand how the performance of modern supercomputing is achieved
  • Understand the basic functionality of the Bash terminal window
  • Understand the basic functionality of Apache Hadoop for scalable, distributed computing
  • Understand the basic functionality of RHadoop
  • Understand the basic problems of supervised and unsupervised learning
  • Perform basic clustering, regression and classification with RHadoop.
Download video: standard or HD

What topics will you cover?

  • Welcome to BIG DATA
  • Working with Hadoop
  • First steps in R and RHadoop
  • Statistical learning with RHadoop: clustering
  • Statistical learning with RHadoop: regression and classification

When would you like to start?

Most FutureLearn courses run multiple times. Every run of a course has a set start date but you can join it and work through it after it starts. Find out more

  • Available now

What will you achieve?

By the end of the course, you‘ll be able to...

  • Explore basic functionality of Apache Hadoop and of RHadoop
  • Experiment how to achieve performance of modern supercomputing
  • Experiment regression and classification with RHadoop;
  • Demonstrate basic clustering, regression and classification with RHadoop;
  • Investigate basic functionality of Bash terminal window

Who is the course for?

This course is designed for people interested in data science, computational statistics and machine learning. It will also be useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand HPC.

Who will you learn with?

I am an active researcher in mathematical optimization, which has many applications in data science and where HPC is an inevitable tool.

Biljsna Mileva Boshkoska is an assistant professor in computer science. Her interests include decision support systems, data mining and working with big data.

Leon Kos is a 25+ years
veteran of using Linux desktop on a daily basis to build digital
relationships for research, teaching, and getting the job done by programming.

Who developed the course?

Partnership for Advanced Computing in Europe (PRACE)

The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels.