Skip to 0 minutes and 4 secondsWe will download Linux Mint with the RHadoop virtual machine provided as a file on our server. The single large file is split into several files that may be easier to download. However, they have to be combined into the single Open Virtual Appliance file, which is then imported into Virtual Box. Of course, VirtualBox needs be installed before you can import our RHadoop virtual machine. Please use the virtualbox.org web site to download the latest binaries for your host operating system. We do not need Extension Pack for our Linux Mint machine. VirtualBox machines can be run on a 64-bit host by Virtualization Technology (VTx) and Virtualization Technology Directed I/O (VTd). Usually these settings are disabled on the level of BIOS.

Skip to 1 minute and 2 secondsTo enable VTx and VTd you have to change the corresponding settings in the BIOS. So, please check these settings if you have trouble installing and starting the RHadooop virtual machine.

Installation of a Linux Virtual Machine

We will be working with a custom-installed Hadoop in the Linux Mint operating system, which is based on the Ubuntu Linux platform, which is further based on the Debian Linux distribution. In this article presentation we will provide an illustrated, step-by-step guide to installing both the Oracle VM VirtualBox (virtualization software) and how to start an already pre-prepared OS with the Hadoop.

  1. The first thing you need to do is download the provided Linux Mint with the Hadoop. You will find it by following link. In the case of problems with a large file download, please use 7-Zip-splitted files in a subdirectory there.
  2. Once you have clicked on the link, left-click on the file named mint-hadoop.ova it will start downloading into your Downloads folder or a pop-up window will appear. Save it to your hard drive, by making sure the option “Save File” is selected and then pressing “OK”.
  3. The .ova extention stands for “Open Virtual Appliance” and is basically the already prepared virtual-machine image, which now just needs to be loaded into the VM VirtualBox. So our next step is to install this program (we are assuming you do not have it installed yet, but in case you do, you can skip all the way down to the last two steps).
  4. Now it is time to first download the appropriate file Oracle VM VirtualBox packages and then install them. The easiest way to do this is just to click on the provided link. Now select the platform you are running your system on and then download the VM.
  5. Next, install the VM from the package you downloaded. Note that you may be prompted during the installation to provide permissions to install the VirtualBox package.
  6. Well done! Now you only need to find the mint-hadoop.ova that you previously downloaded and double left click on it. This will automatically run the image inside the VM. Once you do that, you should get something like that shown in the figure below. Alt text
  7. Now before you press the button “Import”, you can customise the settings in the window entitled “Import Virtual Appliance”. The only settings we recommend changing are the ones for the number of available CPU cores, dubbed “CPU”, and the amount of the available RAM. We recommend at least 4GB of RAM and more than 1 available CPU core dedicated for the VM. After you do this, press “Import”. Processing the import will take a while. After the process is done you should see a result similar to the figure below. Alt text
  8. Now press the “Start” button. Once the OS is loaded, select the “hduser”, and insert the password “hadoop”. Now continue to the Desktop and we are ready to go.
  9. You may notice the following “software rendering” notice when you login. Alt text You may simply ignore the message as we will not be using graphics-intensive interface. Optionally, you may try to Enable 3D Acceleration under Machine -> Settings -> Display -> Screen -> Acceleration.
  10. If having problems in setting up a 64-bit host then check and enable Virtualization Technology (VTx) and Virtualization Technology Directed I/O (VTd).

Please join the discussion that follows and post any problems during the installation and initial tests so that we know and can help you to solve the problems.

Share this video:

This video is from the free online course:

Managing Big Data with R and Hadoop

Partnership for Advanced Computing in Europe (PRACE)