Skip main navigation

Using R to plot data

This video demonstrates an R package called ggplot2 that provides extensive plotting capabilities, which can be accessed from Weka. Detailed instructions are given in the accompanying download (these slides do …

Setting up R with Weka

R is a powerful statistical programming system that contains data mining tools for classification, regression, and plotting data, some of them very advanced. Eibe Frank shows how to access these …

LibSVM and LibLINEAR

Ian Witten demonstrates LibLINEAR, which contains fast algorithms for linear classification; and LibSVM, which produces non-linear SVMs. Both implement support vector machines – which are already available in Weka as …

Signal peptide prediction

Tony Smith introduces signal peptide prediction, an application of data mining to a problem in bioinformatics. A sequence of amino acids that makes up a protein begins with an initial …

Classifying tweets

Twitter is a vast, continuous, prolific, real time data stream. Sentiment analysis is the task of classifying tweets as positive or negative according to the feelings they express. Emoticons constitute …

MOA classifiers and streams

Change is everywhere! – and is a distinguishing feature of data stream mining. Bernhard Pfahringer explains that one way of dealing with change is to use an adaptive windowing method …

The MOA interface

We download MOA and run it. Incremental data stream mining calls for different evaluation methods from batch operation. One possibility is to interleave training and testing by periodically holding out …

Weka’s MOA package

MOA is open source software that is specifically designed for mining data streams. It can handle evolving data streams – ones generated by mechanisms that change, or drift, over time. …

Incremental classifiers in Weka

Albert Bifet introduces data stream mining. It requires incremental operation rather than the batch mode used so far. Weka includes many different incremental methods. Updating decision trees presents an interesting …

Analyzing infrared data from soil samples

Some feel that data miners focus too much on new methods and tiny improvements in accuracy, instead of on applications that will make a real difference in practice. Geoff Holmes …

Lag creation and overlay data

There are many parameters and options for deriving time-dependent attributes, such as which attribute holds the timestamp and what is the periodicity of the data. Periodicity affects the lagged variables …

Looking at forecasts

Weka’s time series forecasting package includes options for visualizing predictions for any number of steps ahead, as well as performance on the training data. As well as visualizing future predictions, …

Using the time series forecasting package

Dealing manually with time series is a pain, as we learned in the last lesson. Weka’s time series forecasting package automatically produces lagged variables, plus many others – perhaps too …

What will you learn?

This video welcomes you to the course, which – unlike earlier courses in this series – is given by the entire data mining team at the University of Waikato in …

Using the MOA interface

Download the latest version of MOA and run it. It’s a Java program, like Weka. If you can run Weka, you can run MOA! Note: The interface differs very slightly …