Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

Attribute selection using ranking

Ian Witten explains that single-attribute methods that rank attributes can eliminate irrelevant attributes – but not redundant ones.
11
Hello again! One final lesson on attribute selection. You’re probably getting a bit fed up with attribute selection by now, but you know it’s really important. It’s one of the things that can really improve the performance of machine learning methods, and more importantly, it really improves the understandability. You know, you select out some attributes – it’s easy to explain to other people what you’ve done to get such good performance on their data set. Attribute selection is pretty important. We’re going to look in this lesson at fast attribute selection using ranking. Remember before in the last lesson we looked at attribute subset selection, which involves a subset evaluation measure and a search method, and we were looking for rapid subset evaluation methods.
55.7
The Wrapper method is very slow, and we were looking for faster alternatives. But, of course, searching is slow. So we’re not doing any searching now. We’re going to use a single-attribute evaluator, that doesn’t evaluate a subset, it evaluates each attribute individually. This can help eliminate irrelevant attributes, but it can’t remove redundant attributes, because it’s only looking at individual attributes, one at a time. You need to choose the ranking search method whenever you select a single-attribute evaluator. The ranking search method doesn’t really search, it just sorts them into rank order of the evaluation. We’ve seen several metrics for evaluating attributes before. We looked in the last course at OneR, ages ago. Remember OneR? It’s effectively a method of ranking attributes.
110.4
In Weka, there are attribute selection methods based on all of these. The OneR attribute evaluator. C4.5, what we know as J48 in Weka, uses information gain, so there’s an information gain attribute evaluator. Actually, it uses gain ratio, slightly more complex than information gain, and there’s also a gain ratio attribute evaluator. In the last lesson we saw the CfsSubsetEvaluation method, and that uses symmetric uncertainty, so there is a symmetric uncertainty attribute evaluator in Weka. The ranker search method is very simple. It just sorts attributes according to their evaluation, and you can specify the number of attributes to retain.
152.7
The default is to retain them all, or you can ask it to discard attributes whose evaluation falls below a certain threshold, or you can specify a certain set of attributes that you want to ignore. Let’s have a look. Let’s compare GainRatioAttributeEval with the other methods we looked at in the last lesson, on the ionosphere data. The gray part of this, the “No attribute selection” and “CfsSubsetEval” and “Wrapper”, those results we got before in the last lesson. We’re just going to look at the GainRatioAttributeEval. I’m going to go to Weka. I’ve got my ionosphere dataset.
194
Of course, I’m going to use the AttributeSelectedClassifier to get a fair evaluation: meta > AttributeSelectedClassifier. Here I’m going to specify – let’s just use Naive Bayes to start off with. I’m going to use the GainRatioAttributeEval.
222.1
If I just run that, it’s not going to work: the attribute evaluators must use the Ranker search method. Sorry about that, I should have specified here the Ranker search method. There are a couple of parameters.
240.9
The number to select: –1 means select them all; it’s not really very useful to select them all. I’m going to select 7, the best 7 attributes. We could have a set to ignore. This threshold here, this bizarre number, is actually minus infinity in Java, so that’s it why it’s such a strange number. That’s all I need to do. I’m going to run that, and I get 89 … 90% accuracy. Let’s go back to the slide and compare this. Last time with Naive Bayes I got 83% accuracy, and then 89% with CfsSubsetEvaluation, 91% with the Wrapper selection method, and with this new method GainRatioAttributeEval, a single-attribute evaluator, I get 90%. Fantastic performance for a method that’s lightning fast.
298.2
For IBk, the performance is really not very good. It’s just the same as IBk without any attribute selection. For J48, it’s the same as J48 without any attribute selection. Single-attribute selection is lightning fast but very sensitive to the number of attributes. I chose 7 here because it turned out to be a good number for this problem. There are a lot of single-attribute evaluators in Weka. We talked about the first four a minute ago. There’s one based on the chi-squared test, one based on support vector machines, one instance-based evaluator, principal components transform, and latent semantic analysis. The workings of these are all explained in the papers that are referenced in the More button for that attribute evaluator.
346
There are also meta-evaluators, which incorporate other operations. That’s it. We’ve seen that attribute subset selection involves searching, which is bound to be slow no matter how quickly you can evaluate the subsets, so instead we can use single-attribute evaluation. It involves ranking, which is really fast. It’s hard to specify a suitable cut-off, you need to do experimentation. It doesn’t cope with redundant attributes. For example, if you have copies of an attribute, then they will be repeatedly selected, because attributes are evaluated individually. Many single-attribute evaluators are based on machine-learning methods we’ve already looked at.

The attribute selection methods we have examined so far strive to eliminate both irrelevant attributes and redundant ones. A simpler idea is to rank the effectiveness of each individual attribute, and choose the top few to use for classification, discarding the rest. This is lightning fast because it does not involve searching at all, but can only eliminate irrelevant attributes, not redundant ones. And the results are very sensitive to the number of attributes that are retained.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now