Skip to 0 minutes and 11 seconds We’ll be using JFreeChart for some of the plotting, because Weka’s plotting is a little bit complicated and it’s much, much nicer doing JFreeChart plots. If you haven’t done already, please install the jfreechartOffscreenRenderer package, which I already mentioned earlier, and, if you’re looking for more Javadoc on the JFreeChart library you can do that on the jfree.org website. The classes that we’ll be touching on for JFreeChart will be datasets that JFreeChart needs for plotting; ChartFactory for creating plots; and ChartPanel, which is actually used for embedding plots in the GUI. And, finally, some Weka classes for displaying trees and graphs. First of all I’m going to start up Weka in the Jython console.
Skip to 0 minutes and 57 seconds For the first script, we’ll want to plot the classifier errors obtained from a linear regression regressor on a dataset and plot these. But not just actual versus predicted, but also take into account how bad the error is. So first thing, we’re going to import a whole bunch of classes
Skip to 1 minute and 27 seconds again: Evaluation for evaluating our classifier; we’re going to use LinearRegression as a simple classifier for doing the regression; DataSource for the usual loading of the dataset. DefaultXYZDataset is a JFreeChart dataset, which allows you to store 3 dimensions for each data point. We’re basically using the z as the error. ChartFactory class for generating the plot; ChartPanel for embedding it; and the BubbleRenderer basically plots a bubble at the x, y position using the z value as the radius. OK. So we’re loading our data. In this case, it’s a numeric class in the bodyfat UCI dataset. Then we are configuring our LinearRegression classifier, turning off some bits we don’t need. It also makes it a bit faster.
Skip to 2 minutes and 35 seconds Once again we are cross-validating our classifier with 10-fold cross-validation. And after the cross-validation is done, we need to collect the predictions and need to compute the error. So what we’re going to do here is quite simple. We’re going to start with three empty lists, the actual, the predicted and the error. We’re going to look through all the predictions, which we can retrieve by the predictions method, and retrieve those predictions; store the actual and predicted; and calculate the error, which is basically actual minus predicted, and the absolute value of that. Having done that, we can then create our dataset, which is a DefaultXYZDataset.
Skip to 3 minutes and 29 seconds We are adding a series to this dataset, which we simply give it a name, like LinearRegression on the name of the dataset, with the actual, predicted, and error. Then we’re using our ChartFactory to create a plot, in this case a scatter plot, with the title “Actual” and “Predicted” as the axis titles. As a renderer, since we not only want to plot a little dot at that location x and y, we use a specific renderer, XYBubbleRenderer. Then we are simply embedding the whole thing in the frame and displaying that. Let’s run that. Here we go.
Skip to 4 minutes and 36 seconds As we can see, some of the outliers are quite large, and the ones that are closest to the diagonal – the optimal case – are the smallest ones. We can even zoom in if we want to, and it adjusts accordingly. The next script handles ROC curves for classification, because the area under the curve and how the curves for the various class labels are is actually telling you an important story about how well your classifier’s doing.
Skip to 5 minutes and 19 seconds In this case, once again: new tab; and we’re going to import a whole lot of classes again. In this case we’re evaluating NaiveBayes, and we’re using the ThresholdCurve class from Weka, which allows us to calculate the ROC curve data, among other things. Since we’re only plotting x, y in this case, we don’t need an XYZDataset; just an XY one will do. Once again ChartFactory and so on, which we’ve already seen in the other one. Now, once again, we’ll load a dataset. In this case, we’re loading the balance-scale UCI dataset, which has a nominal class.
Skip to 6 minutes and 3 seconds Setting the class attribute to the last one again; instantiating our NaiveBayes classifier (no options to be set); and cross-validating that once again with 10-fold cross-validation to obtain the statistics. We’re creating our dataset again, and since we want to plot the ROC curves for all the class labels we’re going to have to look through all the labels. So what we’re going to do here is we’re going to have a variable which is going to range from 0 to the number of values minus 1 that the class attribute has.
Skip to 6 minutes and 57 seconds In each case, we’re going to create the threshold curve data, so we instantiate a ThresholdCurve object, and then use the predictions of the Evaluation class and the current index of the label that we’re interested in, and create curve data from that. We can simply extract those columns of data from the dataset curve that was generated, and put that into a list. We’re looking at the False Positive Rate versus the True Positive Rate that we want to plot. Then, since we already have a dataset, we’re adding a plot series to it and, to make it a bit more interesting, we’re also calculating the ROC curve for each of the class labels and using that as the label for the plot.
Skip to 8 minutes and 3 seconds Now we’re creating an XY line plot, because we’re connecting the dots rather than just dotting them around like it was with the bubble plot earlier. Put the titles for the axes down, False Positive Rate and True Positive Rate. Then, once again, put that in a frame and display it. Let’s run that, and we have our three class labels L, B, and R. As you can see, the blue line is the worst one; and if you look it up, it also only has an ROC of 0.719, whereas the other ones have almost 1. As, you can see, they go straight up and really nestle quite nicely in the corner here, and then plateauing out at pretty much 1 there.
Skip to 8 minutes and 53 seconds So that looks pretty sweet. This was using JFreeChart to plot some graphs. However, we can also plot some data using simple Weka classes. In this case, we want to plot a tree that got generated by J48. Once again we import stuff, and for visualization we’re going to use the TreeVisualizer. First of all, once again we have to load some data in, in this case the iris dataset. We’re going to build an unpruned J48 tree, build it on the dataset, and then we’re creating a TreeVisualizer using the graph that the built classifier returns.
Skip to 9 minutes and 53 seconds Then we’re embedding the whole thing in a frame, visualizing that, and once the frame has been displayed we can also fit the tree then to the size that’s on the screen. Running that. We have our nice little tree of the iris dataset. Now, trees aren’t the only thing that Weka can plot. The BayesNet classifier allows you to plot network graphs and this is what we’re going to do now. In this case, we’re going to use the BayesNet classifier and the GraphVisualizer from Weka to plot a graph that this classifier generates. Once again, load the iris dataset, and we’re going to configure our BayesNet classifier.
Skip to 10 minutes and 51 seconds To make the graph a little bit more interesting I’m using two parents rather than just one. I am building the classifier, and then I’m initializing the GraphVisualizer using the graph that the classifier returned. In this
Skip to 11 minutes and 19 seconds case, it’s in the BIF or Bayesian Network Interchange Format (http://www.cs.cmu.edu/~fgcozman/Research/InterchangeFormat/). Once again, we embed the whole thing in a frame, display that, and just like with a TreeVisualizer, we also want to make sure that the layout is all right. Let’s run this, and we have our little network graph. If we click on the various nodes, we can then see the probability tables. We can inspect it further.
Peter shows how to create visualizations from Weka’s Jython console using the open source library JfreeChart. First he plots the errors made by LinearRegression on a dataset, indicating the size of each error by the radius of a bubble. Next he displays multiple ROC curves, one for each class in the dataset. Then he draws a tree generated by J48, and finally a graph generated by the BayesNet classifier.
© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.