Skip to 0 minutes and 10 secondsWelcome to RHadoop MOOC. I am Leon Kos from University of Ljubljana. In this week we will introduce you to the to big data concepts and what is all about. Since by listening you can't learn how to play a piano we will need to have hands-on a keyboard and learn by repeating some commands in a virtual Linux machine that we have prepared for the course. That's how we will be able to understand the principles. For the first week we will just prepare for the challenge. This means we'll prepare a VirtualBox with Linux where everything is already prepared correctly for you.
Skip to 0 minutes and 51 secondsVirtualBox machine is a ligtweight approach for all of us to have a common working environment no mater which operating system you have. By lightweight approach I mean that you do not need a powerful laptop or a desktop to run the virtual machine. For those that are not familiar with Linux desktop and console we will prepare articles and videos and some quizzes where you can check how well you are prepared for big questions. Therefore, we don't assume previous knowledge on Linux console commands, R language or Hadoop. Some console commands are self-explanatory and might be used as a reference for the following weeks.
Skip to 1 minute and 39 secondsTo show you some excitement on the upcoming challenges we will give some example commands to check that everything is working correctly. This will be simple commands without much explanation on the principles as they will be explained in the following weeks. Therefore, we will firstly show some target tools of the trade inside the environment where you may ask yourselves what is all about big data? You will be challenged to project those simple examples to bigger questions. In the following weeks we will introduce and explain "divide and conquer" approach with big data problems. You will see many similarities of this approach with related fields such as High Performance Computing.
Skip to 2 minutes and 28 secondsWe may then project the upcoming challenges with the tools introduced in later weeks. For asking demanding questions R language is a popular choice. When we have "large questions" we will need to orchestrate more than one machine. For that RHadoop will be showing its power inside R-studio. Therefore, Week 1 will show you the exciting way of distributed computing on data from which you will be able to answer not so simple questions in next weeks. Such knowledge is an attractive investment nowadays and can be easily sold due to a large number of applications possible. Why this is so? Because the digital world has changed lately and now everybody wants to act globally on data to get competitive answers.
Skip to 3 minutes and 19 secondsAnd data is no longer small as there are large quantities of cheap storage available. So, stay with us and discuss the challenges.
Welcome to Week 1
This video will introduce you to the topics we’ll be covering in the following weeks; the software on the virtual machine that we’ll be using and how we’ll progress on FutureLearn. The transcripts for all videos are available at the bottom of each video. You may follow our educators while you are on course:
Key areas of the course are:
- Running Hadoop inside Linux based virtual machine;
- Using Hadoop and scripting;
- Using RStudio with RHadoop Map/Reduce principles;
- Statistical learning.
© PRACE and University of Ljubljana