Online course in Tech & Coding

Managing Big Data with R and Hadoop

Learn how to manage and analyse big data using the R programming language and Hadoop programming framework.

  • Duration 5 weeks
  • Weekly study 4 hours

Why join the course?

This online course will introduce you to various high performance computing (HPC) facilities for big data analysis. This includes R – a programming language renowned for its simplicity, elegance and community support – and Hadoop – an open source, Java-based programming framework for large data sets.

You will find out how to use them, avoiding common pitfalls and saving you time and money.

What topics will you cover?

  • First steps in R and RStudio
  • Working with Apache Hadoop 1 – Fundamentals
  • Working with Apache Hadoop 2 – RHadoop
  • Statistical learning using RHadoop

What will you achieve?

By the end of the course, you will:

  • Understand how the performance of modern supercomputing is achieved
  • Understand the basic functionality of the Bash terminal window
  • Understand the basic functionality of Apache Hadoop for scalable, distributed computing
  • Understand the basic functionality of RHadoop
  • Understand the basic problems of supervised and unsupervised learning
  • Perform basic clustering, regression and classification with RHadoop.
Download video: standard or HD

What topics will you cover?

  • Welcome to BIG DATA
  • Working with Hadoop
  • First steps in R and RHadoop
  • Statistical learning with RHadoop: clustering
  • Statistical learning with RHadoop: regression and classification

When would you like to start?

  • Date to be announced
Add to Wishlist to be emailed when new dates are announced

What will you achieve?

By the end of the course, you'll be able to...

  • Explore basic functionality of Apache Hadoop and of RHadoop
  • Experiment how to achieve performance of modern supercomputing
  • Experiment regression and classification with RHadoop;
  • Demonstrate basic clustering, regression and classification with RHadoop;
  • Investigate basic functionality of Bash terminal window

Who is the course for?

This course is designed for people interested in data science, computational statistics and machine learning. It will also be useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand HPC.

Who will you learn with?

Janez Povh

Janez Povh

I am an active researcher in mathematical optimization, which has many applications in data science and where HPC is an inevitable tool.

Biljana Mileva Boshkoska

Biljana Mileva Boshkoska

Biljsna Mileva Boshkoska is an assistant professor in computer science. Her interests include decision support systems, data mining and working with big data.

Leon Kos

Leon Kos

Leon Kos is a 25+ years
veteran of using Linux desktop on a daily basis to build digital
relationships for research, teaching, and getting the job done by programming.

Who developed the course?

The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels.