Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 11 secondsHello again! One final lesson on attribute selection. You’re probably getting a bit fed up with attribute selection by now, but you know it’s really important. It’s one of the things that can really improve the performance of machine learning methods, and more importantly, it really improves the understandability. You know, you select out some attributes – it’s easy to explain to other people what you’ve done to get such good performance on their data set. Attribute selection is pretty important. We’re going to look in this lesson at fast attribute selection using ranking. Remember before in the last lesson we looked at attribute subset selection, which involves a subset evaluation measure and a search method, and we were looking for rapid subset evaluation methods.

Skip to 0 minutes and 56 secondsThe Wrapper method is very slow, and we were looking for faster alternatives. But, of course, searching is slow. So we’re not doing any searching now. We’re going to use a single-attribute evaluator, that doesn’t evaluate a subset, it evaluates each attribute individually. This can help eliminate irrelevant attributes, but it can’t remove redundant attributes, because it’s only looking at individual attributes, one at a time. You need to choose the ranking search method whenever you select a single-attribute evaluator. The ranking search method doesn’t really search, it just sorts them into rank order of the evaluation. We’ve seen several metrics for evaluating attributes before. We looked in the last course at OneR, ages ago. Remember OneR? It’s effectively a method of ranking attributes.

Skip to 1 minute and 50 secondsIn Weka, there are attribute selection methods based on all of these. The OneR attribute evaluator. C4.5, what we know as J48 in Weka, uses information gain, so there’s an information gain attribute evaluator. Actually, it uses gain ratio, slightly more complex than information gain, and there’s also a gain ratio attribute evaluator. In the last lesson we saw the CfsSubsetEvaluation method, and that uses symmetric uncertainty, so there is a symmetric uncertainty attribute evaluator in Weka. The ranker search method is very simple. It just sorts attributes according to their evaluation, and you can specify the number of attributes to retain.

Skip to 2 minutes and 33 secondsThe default is to retain them all, or you can ask it to discard attributes whose evaluation falls below a certain threshold, or you can specify a certain set of attributes that you want to ignore. Let’s have a look. Let’s compare GainRatioAttributeEval with the other methods we looked at in the last lesson, on the ionosphere data. The gray part of this, the “No attribute selection” and “CfsSubsetEval” and “Wrapper”, those results we got before in the last lesson. We’re just going to look at the GainRatioAttributeEval. I’m going to go to Weka. I’ve got my ionosphere dataset.

Skip to 3 minutes and 14 secondsOf course, I’m going to use the AttributeSelectedClassifier to get a fair evaluation: meta > AttributeSelectedClassifier. Here I’m going to specify – let’s just use Naive Bayes to start off with. I’m going to use the GainRatioAttributeEval.

Skip to 3 minutes and 42 secondsIf I just run that, it’s not going to work: the attribute evaluators must use the Ranker search method. Sorry about that, I should have specified here the Ranker search method. There are a couple of parameters.

Skip to 4 minutes and 1 secondThe number to select: –1 means select them all; it’s not really very useful to select them all. I’m going to select 7, the best 7 attributes. We could have a set to ignore. This threshold here, this bizarre number, is actually minus infinity in Java, so that’s it why it’s such a strange number. That’s all I need to do. I’m going to run that, and I get 89 … 90% accuracy. Let’s go back to the slide and compare this. Last time with Naive Bayes I got 83% accuracy, and then 89% with CfsSubsetEvaluation, 91% with the Wrapper selection method, and with this new method GainRatioAttributeEval, a single-attribute evaluator, I get 90%. Fantastic performance for a method that’s lightning fast.

Skip to 4 minutes and 58 secondsFor IBk, the performance is really not very good. It’s just the same as IBk without any attribute selection. For J48, it’s the same as J48 without any attribute selection. Single-attribute selection is lightning fast but very sensitive to the number of attributes. I chose 7 here because it turned out to be a good number for this problem. There are a lot of single-attribute evaluators in Weka. We talked about the first four a minute ago. There’s one based on the chi-squared test, one based on support vector machines, one instance-based evaluator, principal components transform, and latent semantic analysis. The workings of these are all explained in the papers that are referenced in the More button for that attribute evaluator.

Skip to 5 minutes and 46 secondsThere are also meta-evaluators, which incorporate other operations. That’s it. We’ve seen that attribute subset selection involves searching, which is bound to be slow no matter how quickly you can evaluate the subsets, so instead we can use single-attribute evaluation. It involves ranking, which is really fast. It’s hard to specify a suitable cut-off, you need to do experimentation. It doesn’t cope with redundant attributes. For example, if you have copies of an attribute, then they will be repeatedly selected, because attributes are evaluated individually. Many single-attribute evaluators are based on machine-learning methods we’ve already looked at.

Attribute selection using ranking

The attribute selection methods we have examined so far strive to eliminate both irrelevant attributes and redundant ones. A simpler idea is to rank the effectiveness of each individual attribute, and choose the top few to use for classification, discarding the rest. This is lightning fast because it does not involve searching at all, but can only eliminate irrelevant attributes, not redundant ones. And the results are very sensitive to the number of attributes that are retained.

Share this video:

This video is from the free online course:

More Data Mining with Weka

The University of Waikato

Contact FutureLearn for Support