Contact FutureLearn for Support
Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.
Online course

Managing Big Data with R and Hadoop

Learn how to manage and analyse big data using the R programming language and Hadoop programming framework.

What’s the difference between a free course and an upgraded course?

Free:

  • Access to the course for its duration + 14 days, regardless of when you join (this includes access to articles, videos, peer review steps, quizzes)
  • No access to course tests
  • No certificate

Upgraded:

  • Unlimited access to the course, for as long as it exists on FutureLearn (this includes access to articles, videos, peer review steps, quizzes)
  • Access to course tests
  • A Certificate of Achievement when you complete the course

Find out more

Managing Big Data with R and Hadoop

Why join the course?

This online course will introduce you to various high performance computing (HPC) facilities for big data analysis. This includes:

  • R – a programming language renowned for its simplicity, elegance and community support, enriched with packages to work with Hadoop. For preparing and running R scripts RStudio IDE will be used;
  • Hadoop – an open source, Java-based programming framework for large data sets.

For better understanding of Hadoop basic knowledge of bash and awk are needed so we also introduce them briefly.

You will learn via different materials, including hands-on exercises, how to use these tools, avoiding common pitfalls and saving you time and money.

What topics will you cover?

  • First steps in R and RStudio
  • Working with Apache Hadoop 1 – Fundamentals
  • Working with Apache Hadoop 2 – RHadoop
  • Statistical learning using RHadoop

What will you achieve?

By the end of the course, you will:

  • Understand how the performance of modern supercomputing is achieved
  • be able to perform basic functionalities within the Bash terminal window;
  • be able to use AWK for basic text processing tasks;
  • Understand the basic functionality of Apache Hadoop for scalable, distributed computing;
  • be able to perform data operations of medium difficulty using R and RHadoop;
  • Understand the basic problems of supervised and unsupervised learning
  • be able to perform clustering, regression and classification methods using RHadoop.
Download video: standard or HD

Skip to 0 minutes and 25 secondsNearly every historical period may be said to have had sources of data that were considered big for that time. Books, documents, drawings, maps and paintings are examples of such data. Yet it is only today that we have to deal with really big data. Luckily, more and more data is digital, but expressed in different formats. Large-scale scientific instruments, social network platforms, cloud solutions, digital cultural heritage are only a few examples of sources of huge amount of text, photo, video and audio materials which are considered big data.

Skip to 0 minutes and 55 secondsBut questions related to data have not changed much: how to store and maintain it, how to understand and how to learn from the data for an improved response in the future. These issues necessarily involve the use of high performance computers. Distributed storage and parallel computing need be considered to avoid loss of data and to make computations efficient.

Skip to 1 minute and 16 secondsJoin us and cope with big data using R and RHadoop.

What topics will you cover?

  • Welcome to BIG DATA
  • Working with Hadoop
  • First steps in R and RHadoop
  • Statistical learning with RHadoop: clustering
  • Statistical learning with RHadoop: regression and classification

When would you like to start?

  • Available now
  • Date to be announced

What will you achieve?

By the end of the course, you'll be able to...

  • Explore basic functionality of Apache Hadoop and of RHadoop
  • Experiment how to achieve performance of modern supercomputing
  • Experiment regression and classification with RHadoop;
  • Demonstrate basic clustering, regression and classification with RHadoop;
  • Investigate basic functionality of Bash terminal window

Who is the course for?

This course is designed for people interested in data science, computational statistics and machine learning. It will also be useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand HPC.

We expect that the followers of the course have basic experiences with linux, bash and R and are capable to download and run virtual machine.

What software or tools do you need?

All software needed to actively participate the course is provided within the virtual machine that the followers are supposed to download and run on the local machine. No extra software is needed. You will need a modest local machine with 15GB free disk space and 2GB RAM.

Who will you learn with?

Janez Povh

I am an active researcher in mathematical optimization, which has many applications in data science and where HPC is an inevitable tool.

Biljana Mileva Boshkoska

Biljsna Mileva Boshkoska is an assistant professor in computer science. Her interests include decision support systems, data mining and working with big data.

Leon Kos

Leon Kos is a 25+ years
veteran of using Linux desktop on a daily basis to build digital
relationships for research, teaching, and getting the job done by programming.

Who developed the course?

The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels.

What’s the difference between a free course and an upgraded course?

Free:

  • Access to the course for its duration + 14 days, regardless of when you join (this includes access to articles, videos, peer review steps, quizzes)
  • No access to course tests
  • No certificate

Upgraded:

  • Unlimited access to the course, for as long as it exists on FutureLearn (this includes access to articles, videos, peer review steps, quizzes)
  • Access to course tests
  • A Certificate of Achievement when you complete the course

Find out more

Get extra benefits, upgrade this course. For $99 you'll get:

Unlimited access

Upgrading will mean you get unlimited access to the course.

Image access mobile

  • Take the course at your own pace
  • Refer to the material at any point in future

If you’re taking a course for free you have access to the course for its duration + 14 days, regardless of when you join. If you upgrade the course you have access for as long as the course exists on FutureLearn.

Access to tests

When you upgrade you’ll have access to any tests during the course.

Image tests mobile

  • Validate your learning
  • Ensure you have mastered the material
  • Qualify for a certificate

To receive a Certificate of Achievement you need to take any tests and score over 70%. You don’t get access to tests if you choose to take a course for free.

Certificate of Achievement

Upgrading means you’ll receive a Certificate of Achievement when you complete the course.

Image certificate mobile

  • Prove your success when applying for jobs or courses
  • Celebrate your hard work
  • Display on your LinkedIn or CV
  • Includes free shipping

To receive a Certificate of Achievement you need to mark 90% of the steps on the course as complete, and score over 70% on any course tests.

Upgrade


Still want to know more? Check out our FAQs