Contact FutureLearn for Support
Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 11 secondsWe’re going to open a dataset, glass.arff, apply J48, and see what we get. Then we’re going to use Wrapper attribute selection with J48, and we’re going to see what attributes that gives us. Then we’re going to use just those attributes and run classification with J48 again on the new dataset. Let’s go over to Weka and try it. I’ve got the glass data open here. I’m going to classify it with J48 with default parameters, and I get 67%. Then I’m going to go to Attribute selection. I’m going to use the Wrapper method, and within the Wrapper method I’m going to choose J48. I’m going to leave everything else at its default value and see what attributes I get.

Skip to 1 minute and 1 secondAnd we get the attribute set {RI, Mg, Al, K, Ba}. So I’m going to go back. RI, Mg, Al, K, Ba; delete the rest; and now I’ve got just that set {RI, Mg, Al, K, Ba}.

Skip to 1 minute and 24 secondsI’m going to classify it again with J48, and I get better accuracy: 71%. (I’m just going to back and undo that filter.) Back to the slide here. I got improved accuracy. If I did the same thing with IBk, I’d get 71% for IBk on the glass data set and 78% if I used just the attributes selected by wrapper attribute selection using IBk.

Skip to 1 minute and 56 secondsThe question is: is this cheating?

Skip to 1 minute and 58 secondsAnd the answer is: yes, it certainly is. The reason is, because we’re using the entire data set to decide on the attribute subset. We should really just use the training data to decide on all of those things and then test the final result on the test set. This is just what the AttributeSelectedClassifier does. Remember the FilteredClassifier we used for supervised discretization? Well, the AttributeSelectedClassifier is the analogous thing for attribute selection. It selects attributes based on the training data only, even if we are within a cross-validation. Then it trains the classifier, again on the training data only. And then it evaluates the whole thing on the test data. I’m going to use that to wrap J48 and see what I get.

Skip to 2 minutes and 46 secondsIt’s a little bit complex to set up. I’m going to get the AttributeSelectedClassifier from meta. I’m going to use the J48 classifier. I’m going to use the wrapper subset evaluator, and within that I get to choose a classifier to wrap for attribute selection.

Skip to 3 minutes and 10 secondsI can choose any classifier – I don’t have to choose J48 again – but I will: I’m going to use J48 both for attribute selection and for classification. Leave everything else at its default value, and run it again. It’s finished now, and I get an accuracy of 72%. Back on the slide. This is not cheating. Actually, it’s slightly surprising that I get a higher accuracy (72%) when I’m not cheating than I did, the 71% that I got when I was cheating. But, you know, we should really use the Experimenter to get reliable results; this is just the result of one run here.

Skip to 3 minutes and 51 secondsIf we were to do the same thing with IBk, I got a dramatic improvement in IBk by using the correct attributes, 78% – by cheating, that is. If I then use the AttributeSelectedClassifier, well, of course I can decide what I’m going to wrap. If I were using IBk, then I’d probably want to select attributes using IBk. That would give me the figure of 71% at the bottom right. Of course, I can use a different classifier to do the attribute selection than the one I’m using for classification. The 69% figure is when I wrap IBk, do attribute selection using IBk, and then classify using J48. The 74% is when I do attribute selection using J48 and then classify using IBk.

Skip to 4 minutes and 45 secondsAnd it’s slightly surprising – you would expect that you would get a better attribute subset by using the classifier that you’re going to be using for classification, so it’s slightly surprising to see that 74% coming in larger than the 71% figure. But, you know, surprising things happen, and if we did a more extensive run with the Experimenter we probably wouldn’t find that. I’m going to check the effectiveness of the AttributeSelectedClassifier for getting rid of redundant attributes. I’m going to open the diabetes dataset and I’m going to use the AttributeSelectedClassifier with Naive Bayes. Remember with Naive Bayes, when you add redundant attributes the performance of Naive Bayes gets worse.

Skip to 5 minutes and 27 secondsSo I’m going to add redundant attributes, copies of attributes, and then I’m going to use the AttributeSelectedClassifier and see if the performance still gets worse. I’m hoping it doesn’t. The AttributeSelectedClassifier should get rid of those redundant copied attributes. I’m going to open diabetes, and I’m going to use the AttributeSelectedClassifier. I’ll just going to do this one more time, because configuration takes a bit of thought. I’m going to use as my classifier Naive Bayes. And I’m going to use the WrapperSubsetEvaluator. Within that, as the classifier to wrap, I’m going to choose Naive Bayes again. I don’t have to, but it’s probably better to use the same classifier. I’m going to leave everything else at its default value, and let’s run that.

Skip to 6 minutes and 24 secondsHere I get 75/76% accuracy, 75.7%. Back on the slide then. If I just run Naive Bayes on the diabetes dataset, I would get 76.3%. Using the attribute selection in the proper way – that is, not cheating – with the AttributeSelectedClassifier, I get 75.7%. It’s a little disappointing that attribute selection didn’t help much on this dataset. But let’s now copy the attributes. I’m going to copy the first attribute. Naive Bayes give me 75.7%, and the Attribute [Selected] Classifier also gives me 75.7%. If I add a bunch more copies of that attribute, 9 further copies, then the performance of Naive Bayes deteriorates to 68.9%, whereas the AttributeSelectedClassifier stays the same, because it’s resistant to these redundant attributes.

Skip to 7 minutes and 21 secondsAnd if I add further copies, then Naive Bayes will slowly get worse and worse, whereas the AttributeSelectedClassifier continues at its standard level of 75.7%. The conclusion is that attribute selection does a good job of removing redundant attributes. In this lesson, we’ve looked at the AttributeSelectedClassifier, which selects attributes based on the training set only, which is the right way to do it.

The Attribute Selected Classifier

Experimenting with a dataset to select attributes and applying a classifier to the result is cheating, if performance is evaluated using cross-validation, because the entire dataset is used to determine the attribute subset. You mustn’t use the test data when setting discretization boundaries! But with cross-validation you don’t really have an opportunity to use the training data only. Enter the FilteredClassifier, which solves the problem. (Does that ring a bell? You saw it before, in Week 2.)

Share this video:

This video is from the free online course:

More Data Mining with Weka

The University of Waikato