Skip main navigation

Cross-validation results

Cross-validation is better than randomly repeating percentage split evaluations. As Ian Witten shows, it gives a more reliable performance estimate.
Hi! Good to see you again. One of the things I like to do with my time is play music, and that little bit of Mozart you hear at the beginning of these videos, that’s me and three friends playing a clarinet quartet I play in an orchestra, and last night I was playing some jazz with a little trio. If you want to hear us play, if you go to Google and find my home page, type my name, “Ian Witten”, you’ll get me here, and every time you visit this page
I’ll play you a tune.
If you refresh the page, I’ll play you another tune.
That’s what I do. Anyway, that’s not what we’re here for. We learned about cross-validation in the last lesson. I said that cross-validation was a better way of evaluating your machine learning algorithm, evaluating your classifier, than repeated holdout, repeating the holdout method. Cross-validation does things 10 times. You can use holdout to do things 10 times, but cross-validation is a better way of doing things. Let’s just do a little experiment. I’m going to start up Weka and open the “diabetes” dataset. Here it is, diabetes.arff, and the baseline accuracy, which ZeroR gives me – that’s the default classifier, by the way, rules/ZeroR – if I just run that, well, it will evaluate it using cross-validation.
Actually, for a true baseline, I should just use the training set. That’ll just look at the chances of getting a correct result if we simply guess the most likely class, in this case 65.1%. That’s the baseline accuracy. That’s the first thing you should do with any dataset. Then we’re going to look at J48, which is down here under “trees”. There it is. I’m going to evaluate it with 10-fold cross-validation.
It takes just a second to do that. I get a result of 73.8%, and we can change the random-number seed like we did before. The default is 1; let’s put a random-number seed of 2. Run it again. I get 75%. Do it again. Change it to, say, 3 (I can choose anything I want, of course), run it again, and I get 75.5%. These are the numbers I get on this slide with 10 different random-number seeds. Those are the same numbers on this slide in the right-hand column, the 10 values I got, 73.8%, 75.0%, 75.5%, and so on.
I can calculate the mean, which for that right-hand column is 74.5%, and the sample standard deviation, which is 0.9%, using just the same formulas that we used before. Before we used these formulas for the holdout method, we repeated the holdout 10 times. These are the results you get on this dataset if you repeat holdout using 90% for training and 10% for testing – which is, of course, what we’re doing with 10-fold cross-validation. I would get those results there, and if I average those I get a mean of 74.8%, which is satisfactorily close to 74.5%, but I get a larger standard deviation, quite a lot larger standard deviation of 4.6%, as opposed to 0.9% with cross-validation.
Now, you might be asking yourself why use 10-fold cross-validation?
With Weka we can use 20-fold cross-validation, or anything: we just set the number of folds here beside the cross-validation box to whatever we want. So we could use 20-fold cross-validation.
What that would do is divide the dataset into 20 equal parts and repeat 20 times: take one part out, train on the other 95% of the dataset. And then do it a 21st time on the whole dataset. So why 10, why not 20? Well, that’s a good question really, and there’s not a very good answer. We want to use quite a lot of data for training, because in the final analysis we’re going to use the entire dataset for training. If we’re using 10-fold cross-validation, then we’re using 90% of the dataset. Maybe it would be a little better to use 95% of the dataset for training, with 20-fold cross-validation.
On the other hand, we want to make sure that what we evaluate on is a valid statistical sample. So in general, it’s not necessarily a good idea to use a large number of folds with cross-validation. Also, of course, 20-fold cross-validation will take twice as long as 10-fold cross-validation. The upshot is that there isn’t a really good answer to this question, but the standard thing to do is to use 10-fold cross-validation, and that’s why it’s Weka’s default. We’ve shown in this lesson that cross-validation really is better than repeated holdout. Remember, on the last slide we found that we got about the same mean for repeated holdout as for cross-validation, but we got a much smaller variance for cross-validation.
We know that the evaluation of this machine learning method J48 on this dataset, “diabetes”, gives 74.5% accuracy, probably somewhere between 73.5% and 75.5%. That is actually substantially larger than the baseline. So J48 is doing something for us, better than the baseline. Cross-validation reduces the variance of the estimate. Bye for now!

Cross-validation is better than randomly repeating percentage split evaluations. The reason is that each instance occurs exactly once in a test set, and is tested just once. Repeated random splits are liable to produce less reliable results: the average will be about the same but the variance is higher. This is confirmed with an experiment on the diabetes dataset: 10 repeated percentage splits yield a variance of 4.6%, as opposed to 0.9% with 10-fold cross-validation. Why 10-fold? Good question! It seems to be a reasonable compromise.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now