Skip to 0 minutes and 11 seconds Hello again. In most courses, there comes a point where things start to get a little tough. In the last couple of lessons, you’ve seen some mathematics that you probably didn’t want to see, and you might have realized that you’ll never completely understand how all these machine learning methods work in detail. I want you to know that what I’m trying to convey is the gist of modern machine learning methods, not the details. What’s important is that you can use them and that you understand a little bit of the principles behind how they work. And the math is almost finished. So hang in there; things will start to get easier – and anyway, there’s not far to
Skip to 0 minutes and 46 seconds go: just a few more lessons. I told you before that I play music. Someone came round to my house last night with a contrabassoon. It’s the deepest, lowest instrument in the orchestra. You don’t often see or hear one. So, here I am, trying to play a contrabassoon for the first time.
Skip to 1 minute and 18 seconds I think this has got to be the lowest point of our course, Data Mining with Weka! Today I want to talk about support vector machines, another advanced machine learning technique. We looked at logistic regression in the last lesson, and we found that these produce linear boundaries in the space. In fact, here I’ve used Weka’s Boundary Visualizer to show the boundary produced by a logistic regression machine – this is on the 2D Iris data, plotting “petalwidth” against “petallength”. This black line is the boundary between these classes, the red class and the green class.
Skip to 1 minute and 58 seconds It might be more sensible, if we were going to put a boundary between these two classes, to try and drive it through the widest channel between the two classes, the maximum separation from each class. Here’s a picture where the black line now is right down the middle of the channel between the two classes. Actually, mathematically, we can find that line by taking the two critical members, one from each class – they’re called “support vectors”; these are the critical points that define the channel – and take the perpendicular bisector of the line joining those two support vectors. That’s the idea of support vector machines.
Skip to 2 minutes and 47 seconds We’re going to put a line between the two classes, but not just any old line that separates them. We’re trying to drive the widest channel between the two classes. Here’s another picture. We’ve got two clouds of points, and I’ve drawn a line around the outside of each cloud – the green cloud and the brown cloud. It’s clear that any interior points aren’t going to affect this hyperplane, this plane, this separating line. I call it a line, but in multi dimensions it would be a plane, or a hyperplane in four or more dimensions.
Skip to 3 minutes and 21 seconds There are just a few of the points in each cloud that define the position of the line: the support vectors. In this case, there are [three] points. Support vectors define the boundary. The thing is that all the other instances in the training data could be deleted without changing the position of the dividing hyperplane. There’s a simple equation – and this is the last equation in this course – a simple equation that gives the formula for the maximum margin hyperplane as a sum over the support vectors. These are a vector product with each of the support vectors, and the sum there. It’s pretty simple to calculate this maximum margin hyperplane once you’ve got the support vectors.
Skip to 4 minutes and 9 seconds It’s a very easy sum, and, like I say, it only depends on the support vectors. None of the other points play any part in this calculation. Now in real life, you might not be able to drive a straight line between the classes. Classes are called “linearly separable” if there exists a straight line that separates the two classes. In this picture, the two classes are not linearly separable. It might be a little hard to see, but there are some blue points on the green side of the line, and a couple of green points on the blue side of the line. It’s not possible to get a single straight line that divides these points.
Skip to 4 minutes and 48 seconds That makes support vector machines – the mathematics – a little more complicated. But it’s still possible to define the maximum margin hyperplane under these conditions.
Skip to 4 minutes and 59 seconds That’s it: support vector machines. It’s a linear decision boundary. Actually, there’s a really clever technique which allows you to get more complex boundaries. It’s called the “Kernel trick”. By using different formulas for the “kernel” – and in Weka you just select from some possible different kernels – you can get different shapes of boundaries, not just straight lines. Support vector machines are fantastic because they’re very resilient to overfitting. The boundary just depends on a very small number of points in the dataset. So it’s not going to overfit the dataset, because it doesn’t depend on almost all of the points in the dataset, just a few of these critical points – the support vectors.
Skip to 5 minutes and 48 seconds So it’s very resilient to overfitting, even with large numbers of attributes. In Weka, there are a couple of implementations of support vector machines. We could look in the “functions” category for “SMO”. Let me have a look at that over here. If I look in “functions” for “SMO”, that implements an algorithm called “Sequential Minimal Optimization” for training a support vector classifier. There are a few parameters here, including, for example, the different choice of kernels.
Skip to 6 minutes and 27 seconds You can choose different kernels: you can play around and try out different things. There are a few other parameters. Actually, the SMO algorithm is restricted to two classes, so this will only work with a 2-class dataset. There are other, more comprehensive, implementations of support vector machines in Weka. There’s a library called “LibSVM”, an external library, and Weka has an interface to this library. This is a wrapper class for the LibSVM tools. You need to download these separately from Weka and put them in the right Java classpath. You can see that there are a lot of different parameters here, and, in fact, a lot of information on this support vector machine package.
Support vector machines
In essence, support vector machines drive a straight line between two classes, right down the middle of the channel – which you can see using Weka’s boundary visualizer. If the classes cannot be separated by a straight line, a device called the “kernel trick” enables support vector machines to make boundaries of different shapes, not just straight lines. Support vector machines are very resilient to overfitting, because the boundary depends on just a few well-chosen data points, not the entire training set. They are implemented by Weka’s SMO classifier.
© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.