Skip main navigation

The Attribute Selected Classifier

Experimenting with a dataset to select attributes and applying a classifier to the result risks cheating! Ian Witten explains what to do about it.

Experimenting with a dataset to select attributes and applying a classifier to the result is cheating, if performance is evaluated using cross-validation, because the entire dataset is used to determine the attribute subset. You mustn’t use the test data when setting discretization boundaries! But with cross-validation you don’t really have an opportunity to use the training data only. Enter the FilteredClassifier, which solves the problem. (Does that ring a bell? You saw it before, in Week 2.)

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now