Contact FutureLearn for Support
Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.
Screen starting DFS, YARN and Rstudio
Starting DFS, YARN and Rstudio

Starting Hadoop, R studio and Rhadoop

We need to quickly check whether everything is working in our virtual machine and to become familiar with the environment we will be working with.

Open the terminal by clicking on the black icon on the bottom left and type

start-dfs.sh
start-yarn.sh
hadoop fs -ls

Then type

rstudio &

to open the Rstudio GUI. We should open an Rscript file and save it to the local of any other folder up to our choice. Next we should set up system variables by copyin the following lines into the script file and execut them:

Sys.setenv(HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib/native")
Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.5.jar")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64") 

We load RHadoop by loading the libraries rhdfs and rmr2 and executing hdfs.init():

library(rhdfs)
library(rmr2)
hdfs.init()

You might see the log and some warning messages, but no filesystem errors while executing the above commands. There might be some glyph rendering errors, which we can simply ignore.

Try to quit and stop the Hadoop servers by using counterpart commands stop-yarn.sh and stop-dfs.sh in terminal.

Share this article:

This article is from the free online course:

Managing Big Data with R and Hadoop

Partnership for Advanced Computing in Europe (PRACE)