Today, we’re going to look a bit more at how to use R from Weka. More specifically, we’ll look at how to use the MLR library from Weka. MLR stands for Machine Learning in R. This library includes many of the learning algorithms that are available in the R environment all nicely bundled up in one package. As we’ll see, it’s quite easy to use MLR from Weka. There is a particular classifier that can be used to do this. OK, let’s have a look at how this classifier works. I have loaded the diabetes data into the Explorer so that we can process it using MLR learning algorithms. One way to use MLR is to just use the R console.
We’ve seen last time that we can, for example, plot the data in the R console, by referring to the data using rdata. This will plot the data that we have loaded into the Preprocess panel. We can also use the MLR learning algorithms from this console by typing in commands. However, that is a little bit inconvenient. Instead, we can use the MLR classifier by selecting it under the Classify panel. We select the Choose button to choose the MLR classifier. As you have seen, this has taken a while, because Weka actually needs to download and install the MLR package in R the first time we want to use it.
However, once this has happened, we don’t need to install the package again, so this will be much faster in the future. OK, now you can see here that we have an MLR package in the classifiers package. There’s an MLR classifier there, so let’s select it. The MLR classifier wraps the MLR R library for building and making predictions using various R classifiers and regression methods. Right. Just like with any other Weka classifier, we have the text box up here which contains the configuration information for the MLR classifier. Let’s just run it with default settings. You press the Start button, and, by default, the MLR classifier runs the rpart learning algorithm in R.
This builds a classification tree from the data using the CART decision tree learning method. You can see that it gets 75% accuracy in the cross-validation on the diabetes data. We get all the other performance statistics that we are used to, as well. Really, we treat the learning algorithm in R just like any other Weka learning algorithm. For this to happen, behind the scenes the MLR classifier actually has to transfer the data into the R environment, build the classifier in the R environment, and then also feed the test data to the classifier in the R environment and get the predictions back. But it all happens transparently.
Further up, we can see the tree that has been generated from this data in textual form. We also get some information on the learning algorithm that was used and the package it originally comes from. We used rpart, which is a classification algorithm, so in MLR it’s called classif.rpart. This learning algorithm comes from the “rpart” package, which is a separate package for R. The MLR package for R just bundles algorithms from a lot of other packages that are available in R in one convenient interface, which we can easily make use of using the MLR classifier. The name is given here and also some properties of this algorithm. It can deal with two classes. It can deal with multiple classes.
It can deal with missing values, numeric variables – numeric attributes, in other words – factors, which are nominal attributes. It could also, potentially, deal with ordinal attributes. It can produce probability estimates, and it can deal with instance weights. This is the rpart learning algorithm from R, but there are many other learning algorithms that are available in the MLR package, and most of them are available through the MLR classifier. We can choose the algorithm we want to use by using the RLearner property. By default, we can see here that rpart is chosen, but there are many other algorithms that we can choose from. There are many classification algorithms, and there are also many regression algorithms.
Let’s run one other classification algorithm in MLR. Let’s run random ferns. This is available as classif.rferns. Living in New Zealand, I am quite fond of ferns, and it’s intriguing to see that there is also a learning algorithm that generates random ferns. Now, you can see that when I’ve clicked this, nothing happens for a while because Weka actually has to download and install the rferns package. That has happened now, and we can use this classifier. The fern is a variant of a decision tree where all the tests at one level of the decision tree are exactly the same, so they all test the same attribute and they perform the same split of the data.
A fern is a restricted form of a decision tree. Just like the random forest classifier does for regular decision trees, it generates an ensemble of ferns. OK, let’s try this. Right. OK, so this classifier is slightly less accurate than the rpart classifier, but there may be other datasets where it outperforms rpart, because it is an ensemble classifier. You’ve seen that it runs quite quickly. It has actually generated an ensemble of 1000 ferns, and the depth was restricted to 5.So maybe we should try to decrease the depth to reduce the chance of overfitting. We can also specify parameters for the learning algorithm here in the learnerParams field of the MLR classifier.
To find out some information about the parameters that we can use, we actually need to go on the web. It’s best to go to the list of learning algorithms that are available in the MLR library first. To do that, we just search for “MLR integrated learners” and we search for the release version of MLR. There is also a development version. The first link here is the link we want. You have the integrated learners here. This has a list of all the learning algorithms that are in the MLR package, and most of them are available through the MLR classifier in Weka. We want to look for rFerns, so I search for “rFerns” on this page. There’s a link here.
This will take us to the appropriate documentation page. It has a list of all the topics that are in the manual for the rFerns package for R. rFerns is the actual learning method, so let’s click on this. Here we have some information on the usage of the method. We have arguments that can be used in R. X and Y is just the data. We can ignore that; that’s filled in by Weka by the MLR classifier. Formula, you can also ignore that, and data, yes, we can ignore that, as well. But here we can see some relevant parameters that we might want to change.
We can change the depth for example of the ferns, and we can change the number of ferns. Let’s change the depth. Let’s try to reduce it in our experiment. What we do is we type depth = 2, if we want to reduce the depth to 2. Let’s re-run the experiment. We start it again. Right. We can also specify multiple parameters. We can change the number of trees that we want to generate. By using the ferns argument, we can say how many ferns we want to include in our ensemble. To specify multiple arguments, we just separate them by a comma. So ferns = 100 will generate 100 ferns instead of 1000.This runs even more quickly now.
The accuracy has actually slightly gone up. This is most likely due to chance. We’ve seen now how we can use MLR classifier from Weka, and you can also run MLR classifier from the other user interfaces in Weka. You can run it from the Weka Experimenter. You can run it from the command line, and you can also run it from the Knowledge Flow.