Skip to 0 minutes and 11 secondsHi! Here we’re going to look at training and testing in a little bit more detail. Here’s the situation. We’ve got a machine learning algorithm, and we feed into it training data, and it produces a classifier – the basic machine learning situation. For that classifier, we can test it with some independent test data. We can put that into the classifier and get some evaluation results. And, separately, we can deploy the classifier in some real situation to make predictions on fresh data coming from the environment. It’s really important in classification, when you’re looking at your evaluation results, you only get reliable evaluation results if the test data is different from the training data.
Skip to 0 minutes and 58 secondsThat’s what we’re going to look at in this lesson. What if you only have one dataset? If you just have one dataset, you should divide it into two parts. Maybe use some of it for training and some of it for testing. Perhaps 2/3rds of it for training and 1/3rd of it for testing. It’s really important that the training data is different from the test data. Both training and test sets are produced by independent sampling from an infinite population. That’s the basic scenario here, but they’re different independent samples. It’s not the same data. If it is the same data, then your evaluation results are misleading. They don’t reflect what you should actually expect on new data when you deploy your classifier.
Skip to 1 minute and 44 secondsHere we’re going to look at the “segment” dataset, which we used in the last lesson. I’m going to open “segment-challenge”.
Skip to 1 minute and 57 secondsI’m going to use a supplied test set. First of all, I’m going to use the J48 tree learner. I’m going to use a supplied test set, and I will set it to the appropriate “segment-test”
Skip to 2 minutes and 18 secondsfile, segment-test.arff. I’m going to open that. Now we’ve got a test set, and let’s see how it does. In the last lesson, on the same data with the user classifier, I think I got 79% accuracy. J48 does much better; it gets 96% accuracy on the same test set. Suppose I was to evaluate it on the training set? I can do that by specifying under Test options “Use training set”. Now it will train it again and evaluate it on the training set. Which is not what you’re supposed to do, because you get misleading results. Here it’s saying the accuracy is 99% on the training set. That is not representative of what we would get using this on independent data.
Skip to 3 minutes and 10 secondsIf we had just one dataset, if we didn’t have a test set, we could do a percentage split. Here’s a percentage split. This is going to be 66% training data and 34% test data. It’s going to make a random split of the dataset. If I run that, I get 95%. That’s just about the same as what we got when we had an independent test set, just slightly worse. If I were to run it again, if we had a different split, we’d expect a slightly different result. But actually, I get exactly the same result, 95.098%. That’s because Weka, before it does a run, it reinitializes the random number generator. The reason is to make sure that you can get repeatable results.
Skip to 4 minutes and 2 secondsIf it didn’t do that, then the results that you got would not be repeatable. However, if you wanted to have a look at the differences that you might get on different runs, then there is a way of resetting the random number generator between each run. We’re going to look at that in the next lesson. That’s this lesson. The basic assumption of machine learning is that the training and test sets are independently sampled from an infinite population, the same population. If you have just one dataset, you should hold part of it out for testing, maybe 33% as we just did or perhaps 10%.
Skip to 4 minutes and 36 secondsWe would expect a slight variation in results each time if we hold out a different set, but Weka produces the same results each time by design by making sure it reinitializes the random number generator each time. We ran J48 on the segment-challenge dataset. Bye for now!
Training and testing
How can you evaluate how well a classifier does? Training set performance is misleading. It’s like asking a child to memorize 1+1=2, 1+2=3 and then testing them on exactly the same questions, whereas you really want them to be able to answer questions like 2+3=?. We want to generalize from the training data to get a more widely applicable classifier. To evaluate how well this has been done, it must be tested on an independent test set. If you only have one dataset, set aside part of it for testing and use the rest for training.
© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.