Skip to 0 minutes and 11 secondsIn data mining, you often want to optimize parameters for some situation, and I’m going to show you some methods in Weka that allow you to do that. These are “wrapper” meta-learners. There are three of them. Do you remember the AttributeSelectedClassifier with WrapperSubsetEval? The way it worked was to select an attribute subset based on how well a classifier performed, and it evaluated that using cross-validation. These do the same kind of thing. CVParameterSelection selects the best value for a parameter. Again, it uses cross-validation.

Skip to 0 minutes and 47 secondsIt can optimize various parameters: the accuracy or the root mean-squared error. GridSearch optimizes two parameters by searching a 2-dimensional grid. The ThresholdSelector selects a probability threshold and you can optimize various parameters with that. Let’s take a look first at CVParameterSelection. Over in Weka, I’ve got the diabetes dataset open, and I’m looking at J48.

Skip to 1 minute and 15 secondsNow, do you remember J48 has got these two parameters: “C” and “M”? We can optimize those. Let’s just run it. In plain mode, we get 73.8%. Now, we can optimize these parameters. Coming back to the slide. We can use CVParameterSelection. The way we express our optimization is to write a loop. The “C” parameter is going to go from 0.1 to 1 in [10 steps]. That will take it right up to 1.0. Actually, if you were to try this, you would find it would fail, because if C is set to 1, then J48 can’t cope with that. Instead, we’re going to use C goes from 0.1 to 0.9 in 9 steps.

Skip to 2 minutes and 4 secondsTo find out about this syntax, you need to use the More button. Let’s go back to Weka and do that.

Skip to 2 minutes and 9 secondsI’m going to choose CVParameterSelection: it’s a meta-classifier. I’m going to wrap up J48.

Skip to 2 minutes and 21 secondsMy string is the C parameter is going from 0.1 to 0.9 in 9 steps.

Skip to 2 minutes and 40 secondsI need to Add that. That’s it here. I’ll leave this and then go back and have another look. It still says the same thing. This is what it’s doing. This is the list you want. You can have several lines in this list. If I just go ahead and do that, then it will optimize that parameter. It will take quite a long time. I’m going to stop it now. I’m going to be disappointed, because actually, I’m going to get worse results. It’ll choose a value of C as 0.1 instead of the default of 0.2, and it’s going to get slightly worse results, only 73.4%. I’m going to get better luck with minNumObj, the other parameter, which is called M.

Skip to 3 minutes and 15 secondsLet’s go back here. We’re going to go back and reconfigure CVParameterSelection.

Skip to 3 minutes and 25 secondsI’m going to add another optimization: M goes from 1 to 10 in steps of 10. I’m going to Add that; it’s first – and then I’m going to do C. So I’m going to loop around M and get the best value for M, and then I’m going to loop around C and get the best value for C with that best value for M. I’m not going to do this; it takes a long time. But let me tell you the results. It gets 74.3% with C as 0.2 and M as 10. Actually, it gets a much simpler tree. We get a very slightly better result than with plain J48, and we get a simpler tree. That’s a worthwhile optimization.

Skip to 4 minutes and 6 secondsThe next method is GridSearch. You can do CVParameterSelection with multiple parameters, and it will optimize the first parameter and then the other parameter. GridSearch optimizes the two parameters together. It allows you to explore not just for a classifier, but the best parameter combinations for a filter and a classifier. You can optimize various things. It’s very flexible, but pretty complicated to set up. Let’s take a quick look at GridSearch. You would need to study this to actually use it. This is the configuration panel. You can see it’s pretty complex. We’re doing “x” and “y”. x is actually going to be the filter. We can optimize a number of components in the filter, the x property.

Skip to 4 minutes and 56 secondsy is going to be the classifier, and we’re going to optimize the ridge parameter of the classifier. That’s in this default configuration. We’re using linear regression, which has got a ridge parameter. This is the parameter we’re optimizing. For the filter, we’re using partial least squares, and that’s got a parameter called numComponent. That’s what we’re going to be optimizing. That’s the default configuration. In order to change this configuration, then you’d need to look at the More button and think about this quite a bit. The third thing I want to look at is a threshold selector. Do you remember in the last class, we looked at probability thresholds, and we found that Naive Bayes uses a probability threshold of 0.5?

Skip to 5 minutes and 43 secondsWe fiddled around with that to optimize a cost matrix. That’s exactly the kind of thing that ThresholdSelector can optimize. In fact, in this case, it’s unlikely to do better than Naive Bayes, but we can do different things. I’m going to use the credit dataset and Naive Bayes. I’ve got them here, the credit database and Naive Bayes. I can just run that, and I’ll get 75.4% accuracy. Now, I can use the threshold selector. Let’s look at the ThresholdSelector. it’s a meta classifier, of course. I’m going to configure that with Naive Bayes. There are various things I can do.

Skip to 6 minutes and 34 secondsThe designated class: I’m going to designate the first class value. In this dataset, the class values are “good” and “bad”. The first class is the “good” class. Let me optimize the accuracy and see what happens. I get exactly the same 75.4% that I got before. We can actually optimize a number of different measures here, in fact, these measures, the TP_Rate and FP_Rate and so on. Back on the slide, there are some new terms here, the F-measure, Precision, and the Recall. Remember the confusion matrix? The TP is there, so that’s True Positive. True Negative (TN) is in the lower right-hand corner of the confusion matrix. The TP_Rate is TP divided by TP plus FN. We’ve talked about those before.

Skip to 7 minutes and 39 secondsWe haven’t talked about Precision, Recall, and F-measure, which are commonly used measures in the area of information retrieval. Those are defined there on the slide for you. Going back to Weka, let’s optimize something simple, like the number of true positives. Look – we’ve got 700 of them here, isn’t that fantastic? A very high number of true positives. Or we could change the classifier to optimize the number of true negatives. Here we get 295, a very high number of true negatives. The threshold value’s actually given here up at the top. You can see it’s chosen almost 1 here. It’s tuning on one third of the data is how it’s evaluating this.

Skip to 8 minutes and 30 secondsWe can optimize other things – Precision, Recall, and F-measure – as well as the accuracy. That’s it.

Skip to 8 minutes and 36 secondsThe moral is: don’t optimize parameters manually. If you do, you’ll overfit, because you’ll use the whole dataset in cross-validation. That’s cheating! We’re going to use wrapper methods using internal cross-validation. We’ve looked at CVParameterSelection, GridSearch, and ThresholdSelection.

# Performance optimization

Machine learning methods often involve several parameters, which should be optimized for best performance. Optimizing them manually is tedious, and also dangerous, because you risk overfitting the data (unless you hold out some data for final testing). Weka contains three “wrapper” metalearners that optimize parameters for best performance using internal cross-validation. *CVParameterSelection* selects the best value for a parameter; *GridSearch* optimizes two parameters by searching a 2D grid; and the *ThresholdSelector* selects a probability threshold.

© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.