Want to keep learning?

This content is taken from the The University of Waikato's online course, More Data Mining with Weka. Join the course to learn more.

Skip to 0 minutes and 11 seconds Hello again! We’re going to look at the Knowledge Flow Interface. The Knowledge Flow Interface is an alternative to the Explorer, and it lets you lay out filters, classifiers, and evaluators interactively on a 2D canvas. There are various other components like data sources, and visualization components, and so on. We have different kinds of connections between the components, and a feature of the Knowledge Flow Interface is that it can work incrementally on potentially infinite data streams. Let’s go ahead and set up a configuration in the Knowledge Flow Interface. I’ll just start it up here. I’m going to load an ARFF file with a DataSource called an ARFF Loader.

Skip to 1 minute and 0 seconds I’m going to configure that – this is a right-click, Configure – to use the iris dataset, which is here. Then I’m going to need a Class Assigner to assign the class. That’s here – Class Assigner. I can make a connection, and I’m going to make a Dataset connection to the Class Assigner. Then I’m going to get a Cross-Validation Fold Maker, because we’re going to evaluate this with cross-validation. I’m going to connect up the dataset to the CrossValidationFoldmaker. Then I’m going to get a classifier. I’ll use good old J48. Here are all of the classifiers. J48 is up here with the tree classifiers at the end. Let me put that there.

Skip to 1 minute and 55 seconds I’m going to connect both the Training Set and the Test Set from the CrossValidationFoldmaker to J48. I’m going to get a Classifier Performance Evaluator in the Evaluation tab. I’m going to connect the classifier – that is, the batch classifier produced by J48 – to this, and I’m going to connect the output to a Text Viewer. Here’s a Text Viewer, the textual output I’m going to connect. Then I’m going to start it all up. I’m going to run it. With my right-click here, I’m going to Start Loading. Let’s have a look at this Text Viewer; right-click to show the results. Here we go. These are the results that we’ve got. Well, we’ve seen these results before many times, of course.

Skip to 2 minutes and 54 seconds There are a lot of different things back on my slide here. This is what I’ve done.

Skip to 3 minutes and 1 second Here’s the configuration I set up. Next, I’m going to add a Model Performance Chart. Let’s find that. That would be under Visualization. Here’s our Model Performance Chart. I’m going to connect the VisualizableError to this. Then I’m going to have a look at the output. Let me just run this again (Start Loading). Now I’m going to look at the output (Show Chart). Here – well, you’ve seen this kind of chart before – I could plot, for example, the predicted class against the actual class. There are a lot of different things you could do.

Skip to 3 minutes and 54 seconds Back on the slide here: let’s work with stream data. I’m going to take an ARFF loader in stream mode – not load a dataset, but a single instance at a time. We’re going to use an updateable classifier, an incremental evaluator, and look at a Strip Chart. We clear all of this over here. Select “Data Source”. Let’s get that ARFF loader going, and configure it to use the iris data.

Skip to 4 minutes and 26 seconds Then I’m going to take that to a Class Assigner, which is in Evaluation.

Skip to 4 minutes and 34 seconds This time I’m going to make an instance connection: I’m just going to send a single instance along here. And I’m not going to make cross validation folds; I’m going to take that straight to an updateable classifier. There’s an updateable version of NaiveBayes. Some classifiers are updateable and some aren’t.

Skip to 4 minutes and 53 seconds NaiveBayes Updateable, let’s use that one. I’m going to connect that instance here to the updateable NaiveBayes classifier. Then I’m going to use an Incremental Classifier Evaluator.

Skip to 5 minutes and 12 seconds It’s an incremental classifier that I’m going to connect up to this. Now I’m going to take the output from that and put it on a Strip Chart. Here’s a Strip Chart.

Skip to 5 minutes and 31 seconds Take the output here to the chart I picked and put it there. Okay. Let’s show the Strip Chart, which is blank at the moment. Then with my ARFF Loader, I will Start Loading. You can see a little bit of output here. I’m going to use a larger dataset. I could configure this, of course, but the simplest thing is to use a larger dataset. Let me use the segment-challenge dataset and start loading again. Now we get this kind of output. This shows you how the class probabilities change for one class and for the other class as we go through. These are effectively learning curves in this situation. We’ve looked at the Knowledge Flow Interface.

Skip to 6 minutes and 19 seconds The panels are broadly similar to the Explorer’s with some exceptions. Evaluation is a separate panel, for example. The facilities are broadly similar, as well, with just a couple of notable exceptions. We can deal incrementally with potentially infinite datasets. That’s what we just did – the configuration we just set up loaded from the file incrementally, so it was never stored in memory at the same time, which is what the Explorer does. The Explorer loads everything into memory. Also, you can look inside cross-validation at the models for individual folds. Some people really like graphical interfaces like this, and it’s really good to know about the Knowledge Flow Interface.

The Knowledge Flow interface

The Knowledge Flow interface is an alternative to the Explorer. You lay out filters, classifiers, evaluators, and visualizers interactively on a 2D canvas and connect them together with different kinds of connector. Data and classification models flow through the diagram!

Note: the version of the Knowledge Flow interface shown here is slightly older than the current version. However, the features are the same, just re-arranged slightly. You now run the Knowledge Flow by clicking the “play” icon at the top left corner, and the interface components are shown down the left-hand side rather than at the top. You can also double click on components instead of right-clicking.

Share this video:

This video is from the free online course:

More Data Mining with Weka

The University of Waikato

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: