Advanced Data Mining with Weka Archives

Course summary

We’ve covered a lot of ground in this course. Congratulations for getting this far, and double congratulations if you’ve managed to do all the Quizzes! I encourage you to keep …

A challenge, and some Groovy

We begin by looking at a real-world challenge: the IDRC (International Diffuse Reflectance Conference) Shootout challenge. The training data – called “calibration data” – and test data is linked to …

Invoking Weka from Python

So far, we’ve been using Python from within Weka. However, in this lesson we work the other way round and invoke Weka from within Python. This allows you to take …

Visualization

Peter shows how to create visualizations from Weka’s Jython console using the open source library JfreeChart. First he plots the errors made by LinearRegression on a dataset, indicating the size …

Building models

Peter demonstrates writing three Python scripts for Weka using the J48 classifier, using the anneal dataset. The first builds a classifier and outputs the model, the second evaluates a classifier …

Invoking Python from Weka

Peter Reutemann introduces scripting, and then demonstrates a Weka package that opens an editor in which you can write and execute Python scripts. Finally he writes a script for loading …

Mike Mayo shows that with appropriate features, Weka can be used to classify images. The imageFilters package processes image files to extract features, and implements 10 different feature sets. You …

Miscellaneous Distributed Weka capabilities

There are other useful KnowledgeFlow templates for Distributed Weka. One computes a correlation matrix for input to Principal Component Analysis; another runs a parallel version of the k-means clustering algorithm. …

Map tasks and Reduce tasks

Map tasks produce models and a Reduce task aggregates them. Reduce strategies differ for Naive Bayes and other model types. We saw in the last lesson that Naive Bayes and …

Using Naive Bayes and JRip

There are many options when configuring a Distributed Weka job. The ArffHeaderSparkJob’s configuration panel has two tabs, Spark configuration, whose options relate to how the cluster is configured, including how …

Installing with Apache Spark

Having installed Distributed Weka, you can interact with it in the KnowledgeFlow environment. New components such as ArffHeaderSparkJob, WekaClassifierSparkJob, and WekaClassifierEvaluationSparkJob become available. In addition, example knowledge flows are provided …