Skip main navigation


“Overfitting” is a problem that plagues all machine learning methods. Ian Witten illustrates it with the OneR classifier and numeric attributes.

“Overfitting” is a problem that plagues all machine learning methods. It occurs when a classifier fits the training data too tightly and doesn’t generalize well to independent test data. It can be illustrated using OneR, which has a parameter that tends to make it overfit numeric attributes. For example, on the numeric version of the weather data, and on the diabetes dataset, we get good performance on the training data, but lousy performance on independent test sets – or with cross-validation. That’s overfitting.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now