Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £29.99 £19.99. New subscribers only. T&Cs apply

Find out more

Repeated training and testing

One way of evaluating is to use percentage splits and repeatedly train and test the classifier using different splits, as Ian Witten demonstrates.
11
In this lesson, we’re going to look a little bit more at training and testing. In fact, what we’re going to do is repeatedly train and test using percentage split. Now, in the last lesson, we saw that if you simply repeat the training and testing, you get the same result each time because Weka initializes the random number generator before it does each run to make sure that you know what’s going on when you do the same experiment again tomorrow. But there is a way of overriding that. So we’ll be using independent random numbers on different occasions to produce a percentage split of the dataset into a training and test set. I’m going to open the “segment-challenge” data again.
59.6
That’s what we used before. Notice there are 1500 instances here; that’s quite a lot. I’m going to go to Classify. I’m going to choose J48, our standard method, I guess. I’m going to use a percentage split, and because we’ve got 1500 instances, I’m going to choose 90% for training and just 10% for testing. I reckon that 10% – that’s 150 instances – for testing is going to give us a reasonable estimate, and we might as well train on as many as we can to get the most accurate classifier. I’m going to run this, and the accuracy figure I get – this is what I got in the last lesson – is 96.6667%. Now, this is misleadingly high accuracy here.
114.9
I’m going to call that 96.7%, or 0.967. And then I’m going to do it again and see how much variation we get in that figure, initializing the random number generator to different amounts each time.
130.5
If I go to the “More options” menu, I get a number of options which are quite useful: outputting the model, we’re doing that; outputting statistics; we can output different evaluation measures; we’re doing the confusion matrix; we’re storing the prediction for visualization; we can output the predictions if we want; we can do a cost-sensitive evaluation; and we can set the random seed for cross-validation or percentage split. That’s set by default to 1. I’m going to change that to 2, a different random seed. We could also output the source code for the classifier if we wanted, but I just want to change the random seed. Then I want to run it again.
175.1
Before we got 0.967, and this time we get 0.94, 94%. Quite different, you see.
184.5
If I were then to change this again to, say, 3, and run it again: again I get 94%. If I change it again to 4 and run it again, I get 96.7%. Let’s do one more. Change it to 5, run it again, and now I get 95.3%. Here’s a table with these figures in. If we run it 10 times, we get this set of results. Given this set of experimental results, we can calculate the mean and standard deviation. The sample mean is the sum of all of these error figures – or these success rates, I should say – divided by the number, 10 of them. That’s 0.949, about 95%. That’s really what we would expect to get.
242.4
That’s a better estimate than the 96.7% that we started out with. A more reliable estimate. We can calculate the sample variance. We take the deviation from the mean, we subtract the mean from each of these numbers, we square that, add them up, and we divide, not by n, but by n – 1. That might surprise you, perhaps. The reason for it being n – 1 is because we’ve actually calculated the mean from this sample. When the mean is calculated from the sample, you need to divide by n – 1, leading to a slightly larger variance estimate than if you were to divide by n.
283.1
We take the square root of that, and in this case we get a standard deviation of 1.8%. Now you can see that the real performance of J48 on the segment-challenge dataset is approximately 95% accuracy, plus or minus approximately 2%. Anywhere, let’s say, between 93–97% accuracy. These figures that you get, that Weka puts out for you, are misleading. You need to be careful how you interpret them, because the result is certainly not 95.3333%. There’s a lot of variation on all of these figures. Remember, the basic assumption is the training and test sets are sampled independently from an infinite population, and you should expect a slight variation in results – perhaps more than just a slight variation in results.
338.9
You can estimate the variation in results by setting the random-number seed and repeating the experiment. You can calculate the mean and standard deviation experimentally, which is what we just did.

You can evaluate a classifier by splitting the dataset randomly into training and testing parts; train it on the former and test it on the latter. Of course, different splits produce slightly different results. If you simply re-run Weka, it repeats the same split – but you can force it to make different splits by altering the random number generator’s “seed”. If you evaluate the classifier several times you can average the results – and calculate the standard deviation.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now