Skip to 0 minutes and 11 seconds Hello! Nice to see you. Nice to be back. It’s me again. This is Class 3, interfacing to other data mining packages. We’re going to concentrate on the R package for most of this class, but just to begin with, we’re just going to look at the LibSVM and LibLINEAR packages. These are written by the same people. They are widely used outside of Weka, and they are also Weka’s most popular packages. You should install them. I’ve got them installed, and also you should install the gridSearch package, as well. Both of these packages are to do with support vector machines. Weka already has the SMO implementation for support vector machines, but LibSVM is more flexible and LibLINEAR can be much faster.
Skip to 0 minutes and 51 seconds It’s important to know that SVMs can be either linear or nonlinear through a kernel function. Also, they can do classification or regression, which we haven’t mentioned. Weka contains SMOreg for regression, the same algorithm. We’re going to use the gridSearch method to optimize parameters for SVMs, which is quite important. Let’s just look at LibSVM and LibLINEAR, these two packages, and also the standard SMO and SMOreg. All three implement linear SVMs. All but LibLINEAR are capable of accommodating nonlinear kernels. LibSVM does one-class classification. LibLINEAR does logistic regression. It’s linear. LibLINEAR is very fast, and LibLINEAR can operate with the L1 norm, which I’m not going to explain in this lesson. Just a quick look at LibLINEAR. I did a speed test.
Skip to 1 minute and 42 seconds I used the data generator on Weka’s Preprocess panel to generate 10,000 instances of this data, LED24. LibLINEAR took two seconds to build the model. LibSVM took 18 seconds to build the model, but that’s a slightly unfair comparison because it’s using a nonlinear kernel. So when I changed it to use a linear kernel, it took 10 seconds. And SMO with default parameters, which is a linear kernel, took 21 seconds. So you can see LibLINEAR is quite a lot faster. Now, let’s just talk about linear boundaries and support vector machines in general. Support vector machines try to drive a channel between the two classes.
Skip to 2 minutes and 21 seconds Here we’ve got the blue class and the green class, and they try and drive a channel halfway between the classes to leave as large a margin as possible. In this case, we’ve got zero errors on the training data, and a pretty small margin, the distance between the dashed lines. However, when we look at the test data – now this is an artificial dataset – but in this case you can see that some points in the test data are being classified incorrectly. Four points, in fact.
Skip to 2 minutes and 48 seconds If, instead of using this line, we turned it a bit and used a line with a much larger margin, although it makes one error on the training data, in this particular situation it gets all of the test data correct – no errors on the test data. It’s an advantage sometimes to have a large margin, even at the expense of errors on the training data. SVMs try to give you large margin classifiers. Here we are with a nonlinear dataset. I’ve drawn a linear boundary here, the boundary that’s produced by LibLINEAR or LibSVM with a linear kernel, or indeed the SMO package and the SMO classifier in Weka. This gives 21 errors on the dataset, or training set.
Skip to 3 minutes and 33 seconds Here’s a nonlinear boundary for the same dataset, implemented by LibSVM with an RBF kernel. I’ve got this dataset open in Weka’s BoundaryVisualizer over here, and I’m going to just choose LibSVM. Luckily, I’ve installed the package already and I just start. OK, let’s speed this up.
Skip to 3 minutes and 59 seconds There we are. That’s the result, and you can see it’s making some errors down here and up here on the dataset, on the training set. Let’s just go to the Explorer. I’ve got the same data file open, and I’m going to go again to LibSVM and take a look. We’re plotting the training set here, so if I look at that I get a total of 9 errors, 4 and 5 respectively on the different training set parts. That’s with the default parameters. If I change the LibSVM parameters, then I can get this boundary.
Skip to 4 minutes and 30 seconds Now this is quite a good boundary, because it gives 0 errors on the training set, but it gives poor generalization, because it doesn’t drive a channel right between those two classes. With different parameters, I can continue to get 0 errors on the training set but a much more satisfactory boundary, which will probably generalize better. Whenever you use nonlinear support vector machines you need to optimize the parameters. The parameters we’re talking about are called “cost” and “gamma”. When we optimize parameters in Weka, we use the gridSearch method, which is in the meta category. These are the parameters for gridSearch. The default configuration for gridSearch, well let’s look at it.
Skip to 5 minutes and 12 seconds Down at the bottom, it says use SMOreg, that’s the default, and evaluate using the correlation coefficient. We’re going to need to change those. Then the first 6 boxes are talking about X of the grid and the next 6 boxes about Y. The X property being optimized is called C, and that’s going from 10^3 down to 10^–3 in multiplicative steps of 10. That’s what those first 6 parameters signify. The second 6 parameters give the same range with the Y property of kernel.gamma. That’s for SMOreg. If we want to use LibSVM, we need to change some things. We’re going to optimize the properties cost and gamma. We’re going to choose the classifier LibSVM and we’re going to evaluate using Accuracy.
Skip to 5 minutes and 59 seconds Let me set that up in Weka. I’m going to choose gridSearch from the meta category. In gridSearch, I’m going to first of all choose the classifier. I’m going to choose LibSVM. I’m going to optimize – let’s move this up so you can see – optimize the Accuracy. And the two properties involved are cost and gamma. If I run that … it’s finished here, and the result is – the parameters are 1000 for the X coordinate, that’s cost, and 10 for the Y coordinate, that’s gamma. We’ve got 100% accuracy with that dataset. We could see we were going to get 100% accuracy when we looked at the boundary visualization. That’s for LibSVM.
Skip to 6 minutes and 58 seconds If we were to choose a different method, like SMO, it’s got different parameters. Let me just look at SMO here. I’m going to choose SMO. I need to find the appropriate parameters. Here’s the SMO parameters. I want C here for the cost, and if I look at the kernel, I want an RBF kernel, and in the RBF kernel the key parameter here is gamma. So it’s kernel.gamma. Kernel here dot gamma here. I’m going to use C and kernel.gamma. C and kernel.gamma. That will allow me to optimize SMO.OK, so gridSearch is fairly complicated to use, but it’s necessary to optimize the parameters when using nonlinear support vector machines. Here’s a summary.
Skip to 7 minutes and 56 seconds We’ve looked at LibLINEAR, which does all things linear, linear SVMs, logistic regression, and it can use the L1 norm, which minimizes the sum of absolute values, not the sum of squares, which has big advantages under certain conditions and is very fast.LibSVM is all things SVM, linear and nonlinear SVMs. The practical advice when you want to use SVMs is first use a linear SVM – do it quickly with libLINEAR, perhaps, and see how you get on. Then for a nonlinear SVM, select the RBF kernel. But when you select a nonlinear kernel like RBF, it’s really important to optimize cost and gamma, and you can do this using the gridSearch method. Here’s a reference to support vector machines, to these packages.
LibSVM and LibLINEAR
Ian Witten demonstrates LibLINEAR, which contains fast algorithms for linear classification; and LibSVM, which produces non-linear SVMs. Both implement support vector machines – which are already available in Weka as the SMO method. The difference is that LibLINEAR is generally far faster than SMO (and can, optionally, minimize the sum of absolute values of errors instead of the sum of squared errors), while LibSVM is far more flexible. Support vector machines can be made to implement different kinds of non-linear decision boundaries using different kernels, and the effect can be explored using Weka’s boundary visualizer. They benefit greatly from a parameter optimization process, which can be done using Weka’s gridSearch meta-classifier.
© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.