Want to keep learning?

This content is taken from the The University of Waikato's online course, Advanced Data Mining with Weka. Join the course to learn more.

Skip to 0 minutes and 11 seconds Hello! My name is Eibe Frank. I’m with the Department of Computer Science at the University of Waikato, the home of Weka, and it is my job to tell you a bit about how to use the statistical computing environment R from Weka. So let’s get started. Because R is implemented in a different programming language than Weka, which is implemented in Java, getting things set up so that Weka can use R is a little bit tricky, but we will go through the steps in this first video. The following assumes that you’re using 64-bit Windows, 64-bit R, and 64-bit Java. You can also do the same if you use 32-bit versions of everything.

Skip to 0 minutes and 49 seconds Furthermore, we’ll assume that you have administrator access on your computer, and we assume that you have a direct connection to the Internet. All right. The first thing we need to do is download R and install it. Download the current version, to save some time, I’ve already downloaded R, and we can just install it from here. OK. We accept this. We run the installer. We want English as the language, and we just accept the license, which is the same as the one used for Weka. Accept the default install location. Now, because I want to use 64-bit R, I unselect 32-bit files here, and then I just go with the standard setup.

Skip to 1 minute and 36 seconds I also want to create a Start menu folder, and we accept the defaults here, as well. OK, finished! Now we have installed R. The first thing we should do is install a particular package in R that is necessary for R to be able to communicate with Java, the programming environment that Weka is implemented in. We start R from the shortcut, and we get the R console, where we can enter text commands. This is the standard way to interact with R, because R is really a programming language. We type in install.packages(“rJava”).We want to install this in the personal library, and we want to create this library.

Skip to 2 minutes and 37 seconds Because I’m in New Zealand, I want to download from a New Zealand computer, a New Zealand server, so I click on New Zealand here. OK, rJava has been installed successfully. We just close this, and now what we need to do next is set up some environment variables. We search for “variables” using the Windows search functionality, and then we click the item “Edit environment variables for your account”. There are already some environment variables there, we need to add some new ones. We click on New to enter a new variable. This new variable is called R_HOME, and the variable value is the location of the R distribution.

Skip to 3 minutes and 31 seconds To find this, we right-click on the R shortcut and we go in Properties, and now we have the location of the R distribution here. It’s the path to the directory containing the R binaries. We select everything up to the “bin” folder. Then we paste it here. That’s the R_HOME variable. The next variable we need to insert is the R_LIBS_USER variable, which determines the location of the user libraries that R installs. Now, we’ve already installed one user library, namely the rJava library, so we just need to find it and put the location of this library here. Let’s just use the Windows search functionality again to search for rJava. It’s a file folder. Now we just go up one level.

Skip to 4 minutes and 33 seconds This is the folder containing all the user libraries for R, so we right-click on this text field and we select “Copy address as text” to copy this path. Then we go back to our form to enter the variable value for our user variable. We right-click and we paste it in. We’re almost done now. The last thing we need to do is modify the PATH environment variable to include the directory containing the R executable. We select this PATH environment variable and click on the Edit button, and at the end we add a semicolon, and then we use the location of the R executable. In this case, we actually use this bit of the path for the R executable.

Skip to 5 minutes and 38 seconds This should be it. We just go OK here. Now what we need to do is install the R plugin package for Weka, which is Weka’s interface for R. We start Weka, and we go to the Package Manager. It just refreshes the package cache at the start. Once it’s done that and popped up the window, we can select the R plugin. RPlugin is here. We choose the install button. OK. Right. There’s quite a bit of information here in this window, install information for the R plugin. This is about setting the environment variables that we just set before, so we just click OK here.

Skip to 6 minutes and 40 seconds Now it takes a little while for R to be downloaded and installed, but it doesn’t take too long. It actually also installs an additional R library, the Java JD library for R, which makes it possible to output R plots in Java. OK, now it’s finished. You just need to restart Weka. We close this, close this. Start Weka again. Now when we start the Explorer, we can load in some data. In this case, just go to the Program Files folder, and then the Weka folder, and there’s a data folder. We load in the iris data.

Skip to 7 minutes and 42 seconds Now we can go to the R console, which is a new tab here that comes as part of the R plugin package, which provides us with a console for R implemented in Java. This console allows us to address the data that we’ve loaded in the Preprocess panel using the name “rdata”. We can go plot(rdata), and this will give us a plot of the iris data generated by R.

Setting up R with Weka

R is a powerful statistical programming system that contains data mining tools for classification, regression, and plotting data, some of them very advanced. Eibe Frank shows how to access these from Weka. To set this up you must install R and then install Weka’s RPlugin package, as this video demonstrates. Detailed instructions are given in the accompanying download (these slides do not appear in the video itself).

The setup used to be rather complicated, but current versions of Weka’s RPlugin package simplify it considerably. Thus you can skip parts of this video, as indicated on the screen. If you are already an R user, first delete the environment variables R_HOME, R_LIBS_USER, and PATH, because the RPlugin package now sets them for you.

Share this video:

This video is from the free online course:

Advanced Data Mining with Weka

The University of Waikato

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: