Learn more about this course.

How about selecting key attributes before applying a classifier?

Ian Witten introduces this week's first Big Question

Removing attributes from a dataset before applying a classifier often results in better performance.

It’s ironic that removing information can improve performance! Isn’t the whole idea of data mining to bring as much data as possible to bear on the problem?

Well, that’s true enough. But individual classification algorithms, though they might try to select the most appropriate attributes to use at any given stage, are not necessarily as good at it as a specially designed attribute selection algorithm, applied before the classifier is invoked. For example, “nearest neighbor” or instance-based algorithms (described in Week 3 of Data Mining with Weka) classify test instances on the basis of the nearest training instance. But attributes irrelevant to the decision will distort the distance measure by adding an irrelevant, perhaps random, dimension to the instance space. Decision tree algorithms try to choose the most informative attribute to split on at each node of the tree, but at lower levels there are not many instances to go on and an irrelevant, noisy, attribute may seem, by chance, to be the best choice.

At the end of this week you will be able to explain – and apply – different approaches to attribute selection. The idea is to select a subset of attributes that will work well for the problem at hand. How can you measure whether they would work well? And to find the best subset seems to require searching through all possible subsets, which is an onerous task. You will soon be able to describe – and use – alternative search strategies that reduce the computation (but do not necessarily arrive at the very best solution).

Want to keep
learning?

This content is taken from
The University of Waikato online course,

More Data Mining with Weka

View Course

A particularly important revelation is that if you apply an attribute selection method and then apply a classification algorithm to the result, the outcome – even when evaluated using cross-validation – is not necessarily an accurate assessment of performance on fresh data. This is because the entire data set is used when choosing the best attribute subset. That’s cheating! Only the training data should be used. (If you’re on the ball, you might recognize a parallel with supervised discretization.) But how on earth can you achieve this with Weka? You’re about to find out …

Want to keep learning?

This content is taken from The University of Waikato online course

More Data Mining with Weka

View Course

See other articles from this course

This article is from the free online

More Data Mining with Weka

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

How about selecting key attributes before applying a classifier?

Want to keep
learning?

More Data Mining with Weka

Want to keep learning?

More Data Mining with Weka

More Data Mining with Weka

More Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

How about selecting key attributes before applying a classifier?

Want to keep learning?

More Data Mining with Weka

Want to keep learning?

More Data Mining with Weka

Share this

More Data Mining with Weka

More Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?