Managing Big Data with R and Hadoop
Duration 5 weeks
Weekly study 4 hours
Why join the course?
This online course will introduce you to various high performance computing (HPC) facilities for big data analysis. This includes R – a programming language renowned for its simplicity, elegance and community support – and Hadoop – an open source, Java-based programming framework for large data sets.
You will find out how to use them, avoiding common pitfalls and saving you time and money.
What topics will you cover?
- First steps in R and RStudio
- Working with Apache Hadoop 1 – Fundamentals
- Working with Apache Hadoop 2 – RHadoop
- Statistical learning using RHadoop
What will you achieve?
By the end of the course, you will:
- Understand how the performance of modern supercomputing is achieved
- Understand the basic functionality of the Bash terminal window
- Understand the basic functionality of Apache Hadoop for scalable, distributed computing
- Understand the basic functionality of RHadoop
- Understand the basic problems of supervised and unsupervised learning
- Perform basic clustering, regression and classification with RHadoop.
What topics will you cover?
- Welcome to BIG DATA
- Working with Hadoop
- First steps in R and RHadoop
- Statistical learning with RHadoop: clustering
- Statistical learning with RHadoop: regression and classification
When would you like to start?
Most FutureLearn courses run multiple times. Every run of a course has a set start date but you can join it and work through it after it starts. Find out more
Who is the course for?
This course is designed for people interested in data science, computational statistics and machine learning. It will also be useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand HPC.
Who will you learn with?
Do you know someone who’d love this course? Tell them about it...
You can use the hashtag #FLMassiveData to talk about this course on social media.