This online course will introduce you to various high performance computing (HPC) facilities for big data analysis. This includes:
- R – a programming language renowned for its simplicity, elegance and community support, enriched with packages to work with Hadoop. For preparing and running R scripts RStudio IDE will be used;
- Hadoop – an open source, Java-based programming framework for large data sets.
For better understanding of Hadoop basic knowledge of bash and awk are needed so we also introduce them briefly.
You will learn via different materials, including hands-on exercises, how to use these tools, avoiding common pitfalls and saving you time and money.
What topics will you cover?
- First steps in R and RStudio
- Working with Apache Hadoop 1 – Fundamentals
- Working with Apache Hadoop 2 – RHadoop
- Statistical learning using RHadoop
What will you achieve?
By the end of the course, you will:
- Understand how the performance of modern supercomputing is achieved
- be able to perform basic functionalities within the Bash terminal window;
- be able to use AWK for basic text processing tasks;
- Understand the basic functionality of Apache Hadoop for scalable, distributed computing;
- be able to perform data operations of medium difficulty using R and RHadoop;
- Understand the basic problems of supervised and unsupervised learning
- be able to perform clustering, regression and classification methods using RHadoop.