Skip to 0 minutes and 11 secondsHi! In this lesson, I want to introduce you to the standard way of evaluating the performance of a machine learning algorithm, which is called “cross-validation”. A couple of lessons back, we looked at evaluating on an independent test set, and we also talked about evaluating on the training set (don’t do that). We also talked about evaluating using the “holdout” method by taking the one dataset and holding out a little bit for testing and using the rest for training. There is a fourth option on Weka’s Classify panel, which is called “cross-validation”, and that’s what we’re going to talk about here. Cross-validation is a way of improving upon repeated holdout. We tried using the holdout method with different random-number seeds each time.

Skip to 1 minute and 0 secondsThat’s called “repeated holdout”. Cross-validation is a systematic way of doing repeated holdout that actually improves upon it by reducing the variance of the estimate. We take a training set and we create a classifier. Then we’re looking to evaluate the performance of that classifier, and there’s a certain amount of variance in that evaluation, because it’s all statistical underneath. We want to keep the variance in the estimate as low as possible. Cross-validation is a way of reducing the variance, and a variant on cross-validation called “stratified cross-validation” reduces it even further. I’m going to explain that in this class. In a previous lesson, we held out 10% for the testing and we repeated that 10 times. That’s the “repeated holdout” method.

Skip to 1 minute and 47 secondsWe’ve got one dataset, and we divided it independently 10 separate times into a training set and a test set. With cross-validation, we divide it just once, but we divide into, say, 10 pieces. Then we take 9 of the pieces and use them for training and the last piece we use for testing. Then with the same division, we take another 9 pieces and use them for training and the held-out piece for testing. We do the whole thing 10 times, using a different segment for testing each time. In other words, we divide the dataset into 10 pieces, and then we hold out each of these pieces in turn for testing, train on the rest, do the testing and average the 10 results.

Skip to 2 minutes and 36 secondsThat would be 10-fold cross-validation. Divide the dataset into 10 parts (these are called “folds”); hold out each part in turn; and average the results. So each data point in the dataset is used once for testing and 9 times for training. That’s 10-fold cross-validation. “Stratified” cross-validation is a simple variant where, when we do the initial division into 10 parts, we ensure that each fold has got approximately the correct proportion of each of the class values.

Skip to 3 minutes and 7 secondsOf course, there are many many many different ways of dividing a dataset into 10 equal parts: we just make sure we choose a division that has approximately the right representation of class values in each of the folds. That’s stratified cross-validation. It helps reduce the variance in the estimate a little bit more. Then, once we’ve done the cross-validation, what Weka does is run the algorithm an eleventh time on the whole dataset. That will produce a classifier that we might deploy in practice. We use 10-fold cross-validation in order to get an evaluation result and estimate of the error, and then finally we do classification one more time to get an actual classifier to use in practice.

Skip to 3 minutes and 54 secondsThat’s what I wanted to tell you. Cross-validation is better than repeated holdout, and we’ll look at that in the next lesson. Stratified cross-validation is even better. Weka does stratified cross-validation by default. And with 10-fold cross-validation, Weka invokes the learning algorithm 11 times, one for each fold of the cross-validation and then a final time on the entire dataset. A practical rule of thumb is that if you’ve got lots of data you can use a percentage split, and evaluate it just once. Otherwise, if you don’t have too much data, you should use stratified 10-fold cross-validation. How big is lots? Well, this is what everyone asks. How long is a piece of string, you know?

Skip to 4 minutes and 35 secondsIt’s hard to say, but it depends on a few things. It depends on the number of classes in your dataset. If you’ve got a two-class dataset, then if you had, say 100–1000 data points, that would probably be good enough for a pretty reliable evaluation if you did 90% and 10% split into the training and test set. If you had, say 10,000 data points in a two-class problem, then I think you’d have lots and lots of data, you wouldn’t need to go to cross-validation. If, on the other hand, you had 100 different classes, then that’s different, right? You would need a larger dataset, because you want a fair representation of each class when you do the evaluation.

Skip to 5 minutes and 22 secondsIt’s really hard to say exactly; it depends on the circumstances. If you’ve got thousands and thousands of data points, you might just do things once with holdout. If you’ve got less than a thousand data points, even with a two-class problem, then you might as well do 10-fold cross-validation. It really doesn’t take much longer. Well, it takes 10-times as long, but the times are generally pretty short.

# Cross-validation

© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.