Skip main navigation

Cross-validation results

Cross-validation is better than randomly repeating percentage split evaluations. As Ian Witten shows, it gives a more reliable performance estimate.

Cross-validation is better than randomly repeating percentage split evaluations. The reason is that each instance occurs exactly once in a test set, and is tested just once. Repeated random splits are liable to produce less reliable results: the average will be about the same but the variance is higher. This is confirmed with an experiment on the diabetes dataset: 10 repeated percentage splits yield a variance of 4.6%, as opposed to 0.9% with 10-fold cross-validation. Why 10-fold? Good question! It seems to be a reasonable compromise.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now