2.9

## The University of Waikato

Skip to 0 minutes and 11 secondsIn this lesson, we’re going to look a little bit more at training and testing. In fact, what we’re going to do is repeatedly train and test using percentage split. Now, in the last lesson, we saw that if you simply repeat the training and testing, you get the same result each time because Weka initializes the random number generator before it does each run to make sure that you know what’s going on when you do the same experiment again tomorrow. But there is a way of overriding that. So we’ll be using independent random numbers on different occasions to produce a percentage split of the dataset into a training and test set. I’m going to open the “segment-challenge” data again.

Skip to 1 minute and 0 secondsThat’s what we used before. Notice there are 1500 instances here; that’s quite a lot. I’m going to go to Classify. I’m going to choose J48, our standard method, I guess. I’m going to use a percentage split, and because we’ve got 1500 instances, I’m going to choose 90% for training and just 10% for testing. I reckon that 10% – that’s 150 instances – for testing is going to give us a reasonable estimate, and we might as well train on as many as we can to get the most accurate classifier. I’m going to run this, and the accuracy figure I get – this is what I got in the last lesson – is 96.6667%. Now, this is misleadingly high accuracy here.

Skip to 1 minute and 55 secondsI’m going to call that 96.7%, or 0.967. And then I’m going to do it again and see how much variation we get in that figure, initializing the random number generator to different amounts each time.

Skip to 2 minutes and 11 secondsIf I go to the “More options” menu, I get a number of options which are quite useful: outputting the model, we’re doing that; outputting statistics; we can output different evaluation measures; we’re doing the confusion matrix; we’re storing the prediction for visualization; we can output the predictions if we want; we can do a cost-sensitive evaluation; and we can set the random seed for cross-validation or percentage split. That’s set by default to 1. I’m going to change that to 2, a different random seed. We could also output the source code for the classifier if we wanted, but I just want to change the random seed. Then I want to run it again.

Skip to 2 minutes and 55 secondsBefore we got 0.967, and this time we get 0.94, 94%. Quite different, you see.

Skip to 3 minutes and 5 secondsIf I were then to change this again to, say, 3, and run it again: again I get 94%. If I change it again to 4 and run it again, I get 96.7%. Let’s do one more. Change it to 5, run it again, and now I get 95.3%. Here’s a table with these figures in. If we run it 10 times, we get this set of results. Given this set of experimental results, we can calculate the mean and standard deviation. The sample mean is the sum of all of these error figures – or these success rates, I should say – divided by the number, 10 of them. That’s 0.949, about 95%. That’s really what we would expect to get.

Skip to 4 minutes and 2 secondsThat’s a better estimate than the 96.7% that we started out with. A more reliable estimate. We can calculate the sample variance. We take the deviation from the mean, we subtract the mean from each of these numbers, we square that, add them up, and we divide, not by n, but by n – 1. That might surprise you, perhaps. The reason for it being n – 1 is because we’ve actually calculated the mean from this sample. When the mean is calculated from the sample, you need to divide by n – 1, leading to a slightly larger variance estimate than if you were to divide by n.

Skip to 4 minutes and 43 secondsWe take the square root of that, and in this case we get a standard deviation of 1.8%. Now you can see that the real performance of J48 on the segment-challenge dataset is approximately 95% accuracy, plus or minus approximately 2%. Anywhere, let’s say, between 93–97% accuracy. These figures that you get, that Weka puts out for you, are misleading. You need to be careful how you interpret them, because the result is certainly not 95.3333%. There’s a lot of variation on all of these figures. Remember, the basic assumption is the training and test sets are sampled independently from an infinite population, and you should expect a slight variation in results – perhaps more than just a slight variation in results.

Skip to 5 minutes and 39 secondsYou can estimate the variation in results by setting the random-number seed and repeating the experiment. You can calculate the mean and standard deviation experimentally, which is what we just did.

# Repeated training and testing

You can evaluate a classifier by splitting the dataset randomly into training and testing parts; train it on the former and test it on the latter. Of course, different splits produce slightly different results. If you simply re-run Weka, it repeats the same split – but you can force it to make different splits by altering the random number generator’s “seed”. If you evaluate the classifier several times you can average the results – and calculate the standard deviation.