Online course

Managing Big Data with R and Hadoop

Learn how to analyse big data using the R programming language and Hadoop programming framework with this advanced course.

Understand how to use R and Hadoop to manage big data

This course will give you access to a virtual environment with installations of Hadoop, R and Rstudio to get hands-on experience with big data management. Several unique examples from statistical learning and related R code for map-reduce operations will be available for testing and learning.

Those with basic knowledge in statistical learning and R will better understand the methods behind and how to run them in parallel using map-reduce functions and Hadoop data storage. At the end of the course you will get access to RHadoop on a supercomputer at University of Ljubljana.

Download video: standard or HD

Skip to 0 minutes and 25 secondsNearly every historical period may be said to have had sources of data that were considered big for that time. Books, documents, drawings, maps and paintings are examples of such data. Yet it is only today that we have to deal with really big data. Luckily, more and more data is digital, but expressed in different formats. Large-scale scientific instruments, social network platforms, cloud solutions, digital cultural heritage are only a few examples of sources of huge amount of text, photo, video and audio materials which are considered big data.

Skip to 0 minutes and 55 secondsBut questions related to data have not changed much: how to store and maintain it, how to understand and how to learn from the data for an improved response in the future. These issues necessarily involve the use of high performance computers. Distributed storage and parallel computing need be considered to avoid loss of data and to make computations efficient.

Skip to 1 minute and 16 secondsJoin us and cope with big data using R and RHadoop.

What topics will you cover?

  • Welcome to BIG DATA
  • Working with Hadoop
  • First steps in R and RHadoop
  • Statistical learning with RHadoop: clustering
  • Statistical learning with RHadoop: regression and classification

When would you like to start?

  • Available now
    This course started 7 January 2019

What will you achieve?

By the end of the course, you'll be able to...

  • Explore basic functionality of Apache Hadoop and of RHadoop
  • Experiment how to achieve performance of modern supercomputing
  • Experiment regression, clustering and classification with RHadoop
  • Investigate basic functionality of Bash terminal window
  • Knowledge about statistical learning to instances of data provided by edcators
  • How to do big data management with RHadoop on real supercomputer provided by Universiy of Ljubljana

Who is the course for?

This course is for people with basic experiences with linux, bash and R, who can download and run virtual machine. You might be interested in data science, computational statistics and machine learning and have basic experiences with them.

It will be also useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand how to manage big data with Hadoop using R programming language.

What software or tools do you need?

All software needed to actively participate the course is provided within the virtual machine that you need to download and run on your local machine. No extra software is needed.

You will need a modest local machine with 15GB free disk space and 2GB free RAM. You can get access to big data RStudio on a real HPC cluster after completing two weeks of exercises.

Who will you learn with?

Janez Povh

I am an active researcher in mathematical optimization, which has many applications in data science and where HPC is an inevitable tool.

Biljana Mileva Boshkoska

Biljsna Mileva Boshkoska is an assistant professor in computer science. Her interests include decision support systems, data mining and working with big data.

Leon Kos

Leon Kos is a 25+ years
veteran of using Linux desktop on a daily basis to build digital
relationships for research, teaching, and getting the job done by programming.

Who developed the course?

The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels.

Supporters

Join this course

Start this course for free, upgrade for extra benefits, or buy Unlimited to access this course and hundreds of other short courses for a year.

Free
$0

Join free and you will get:

  • Access to this course for 7 weeks

Upgrade
$99

Upgrade this course and you will get:

  • Access to this course for as long as it’s on FutureLearn
  • Access to this course’s tests as well as a print and digital Certificate of Achievement once you’re eligible
New

Unlimited (New!)
$269 $199 for one year

Buy Unlimited and you will get:

  • Access to this course, and hundreds of other FutureLearn short courses and tests for a year
  • A printable digital Certificate of Achievement on all short courses once you’re eligible
  • The freedom to keep the content of any of the courses you've gained a digital Certificate of Achievement on
  • The flexibility to complete your choice of short courses in your own time within the year
Find out more about upgrades or Unlimited.
Introductory offer. Available until 11 May 2019. T&Cs apply.