Skip main navigation

How do I evaluate a classifier’s performance?

Ian Witten introduces this week's Big Question

This week is all about evaluation.

Last week you downloaded Weka and looked around the Explorer and a few datasets. You used the J48 classifier. You used a filter to remove attributes and instances. You visualized some data, and classification errors. Along the way you encountered a few datasets: the weather data (both nominal and numeric versions), the glass data, and the iris data.

As you have seen, data mining involves building a classifier for a dataset. (Classification is the most common problem, though not the only one.) Given an example or “instance”, a classifier’s job is to predict its class. But how good is it? Predicting the class of the instances that were used to train the classifier is pretty trivial: you could just store them in a database. But we want to be able to predict the class of new instances, ones that haven’t come up before.

The aim is to estimate the classifier’s performance on new, unseen, instances. Testing using the training data is flawed because it “predicts” data that was used to build the classifier in the first place. We need to go beyond what we see in the training data and predict outcomes – class values – for data that has never been seen before. The only way to know whether a classifier has any value is to test it on fresh data.

But where does the fresh data come from? It’s a conundrum, and that is what we explore this week. The basic idea is to split the data into two parts: the training data and the test data. The training data is used to build the model – the rules, if you like – that say how instances should be classified. After this is done, the test data is used to evaluate the model (rules). To do this, the model built during the training phase is applied to each test instance, and the result is the predicted class for that instances. The system compares this to the real class value defined for the instance and calculates what percentage are correct.

In the first activity you’re going to experience what it’s like to actually be a classifier yourself, by constructing a decision tree interactively. In subsequent activities we’ll look at evaluation, including training and testing, baseline accuracy, and cross-validation.

At the end of the week you will know how to evaluate the performance of a classifier on new, unseen, instances. And you will understand how easy it is to fool yourself into thinking that your system is doing better than it really is.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education