Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

Scheme-independent selection

Scheme-independent methods that do not depend on a particular classifier can be faster than the wrapper method. Ian Witten explains how they work.
Hello again! Welcome back to New Zealand.
In this lesson, we’re going to look at a new class of attribute selection methods: scheme-Independent attribute selection. The Wrapper method that we looked at before is straightforward, simple, and direct – but it’s really slow. So here are a couple of alternatives. We could use a single-attribute evaluator, evaluate the attributes one by one independently, and then rank them and base our attribute selection on that. That allows us to eliminate irrelevant attributes, and we’ll be looking at that in the next lesson. A second alternative is to combine an attribute subset evaluator with a search method. And that allows us to eliminate redundant attributes as well as irrelevant ones, so it’s potentially much more powerful.
Now we’ve already looked at search methods, and we’ve looked at one kind of attribute subset evaluator, the wrapper method. That is a way, a scheme-dependent way, of evaluating an attribute subset. Now we’re going to look at scheme-independent ways of evaluating attribute subsets. In fact, we’re going to look at a method called CfsSubsetEval. It considers an attribute subset to be good if the attributes it contains are highly correlated with the class attribute and not strongly correlated with one another. It comes up with a measure of “goodness” of an attribute subset. This is a measure applied to a subset.
We sum the correlation between the attribute and the class over all of the attributes in the subset; then we divide that by the correlations of each attribute with each other attribute, summed over all pairs of attributes (we take the square root of that). For correlation, the CfsSubsetEval method uses an entropy-based metric called the “symmetric uncertainty”. It’s pretty straightforward, but I’m not going to talk about that. Let’s try it. Let’s compare CfsSubsetEval with Wrapper selection on the ionosphere data. We’re going to look first at Naive Bayes. Coming over to Weka here, I’ve got the ionosphere data open, and I’m going to classify that with Naive Bayes, standard Naive Bayes. When I do that, I get 82–83%. All right.
Now let’s do attribute selection and, of course, we’re going to use the AttributeSelectedClassifier to ensure that we’re not cheating. That’s a meta classifier, the AttributeSelectedClassifier. Within that, remember, we can select a classifier – we’re going to choose Naive Bayes – and we’re also going to choose a subset evaluator – we’re going to use the default, CfsSubsetEval. And for the search method, I’ll just use the default search method. Let’s run that. Now we get 88.6% … 89%, which is a lot better, so attribute selection has really helped here. Let’s try attribute selection using the Wrapper method. I’m going to use the same learning scheme, Naive Bayes, but here I’m going to choose the Wrapper method.
For that, of course, I’ve got to specify a machine-learning method to use to wrap, and we’re going to wrap Naive Bayes. I’m going to run that – everything else is default – and it’s going to take a while. Here we go. It’s finished now; it took quite a long time. We got 91% accuracy. Back on the slide. In the NaiveBayes column, we got 83% without attribute selection. Attribute selection helped quite a lot, with CfsSubsetEval, which is very fast – and it was even better with the very slow Wrapper method. When I did IBk, I got 86% for plain IBk, 89% for CfsSubsetEval.
And for the wrapper, I wrapped IBk – in each of these things, I wrapped the corresponding classifier, the one that we’re using for classification – and I got 89%. The two attribute selection methods were the same. J48 was already extremely good without any attribute selection. I got 92% for the very fast method, and in fact, I got slight worse results (90%) for the much slower wrapper selection. A little bit surprising that wrapper selection does worse than CfsSubsetEval for J48. These are just based on one run, of course. The conclusion is that CfsSubsetEval is nearly as good as the Wrapper method, and much faster. There are a number of attribute subset evaluators in Weka. There are a couple of scheme-dependent methods.
The WrapperSubsetEval uses internal cross-validation, and I think in a previous lesson we mentioned briefly the ClassifierSubsetEval, which is like the Wrapper method but instead of using cross-validation it uses a separate held-out test set. Those are scheme-dependent. And then the scheme-independent methods, there are a few of those. We’ve looked at CfsSubsetEval, and there’s another one called the ConsistencySubsetEval, which measures consistency in class values of the training set with respect to the attributes. If I go over to Weka here and have a look at the different methods of attribute selection. There’s CfsSubsetEval. I talked about ClassifierSubsetEval, that’s a scheme-dependent method. ConsistencySubsetEval, that’s the one we were just talking about, and I can look at that and get some more information.
It evaluates the worth of a subset by consistency, and to really understand that method you need to go and look at the paper where it’s referenced. As you can see, there are quite a lot of different methods for attribute subset evaluation, and the list includes meta-evaluators, which incorporate other operations. I’m not going to talk about that here. In conclusion, attribute subset selection involves a subset evaluation measure and a search method. Some measures are scheme-dependent, like the Wrapper method, which is very slow, and others are scheme-independent, like CfsSubsetEval, which, as we found, is quite fast. Even faster is to use a single-attribute evaluator using ranking, and we are going to talk about that in the next lesson.

Attribute selection methods that do not involve a classifier can be faster than the wrapper method. They can use the same kind of searching, but evaluate each subset using a heuristic instead of applying a particular classifier. In this video we look at a scheme-independent attribute selection method that is nearly as good as the wrapper method, and is much faster.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now