Want to keep learning?

This content is taken from the The Open University & Persontyle's online course, Advanced Machine Learning. Join the course to learn more.

Skip to 0 minutes and 3 seconds So now we’re going to look at the support vector machine algorithm this time. It’s going to be a classification problem. Once again, remember a support vector machine is a kernel method algorithm. So we’ll be dealing with kernels again. We’re going to be using the implementation from the e1071 package, which is a funny name for a package. I think it was named after the room in which it was developed in a German university. I believe it is a front-end for a very powerful and commonly used LIBSVM library. So let’s get going and look at the code. We’re in SVM [? ex1.r. ?] As before, we prepare the dataset from the library.

Skip to 0 minutes and 53 seconds Now we’re going to split the data randomly into training and test data sets. We’re working, by the way, with the iris data here. So remember, we’ve got sepal length, sepal width, pedal length, petal width, and the species of iris flower. Well, there’s three of those. And the task is to predict the species based on the other features. Now we’re going to be working with SVMs. The e1071 library allows four types of SVMs, that’s to say, four different types of kernels. Now it also comes with a tune SVM function, which allows you to specify a whole bunch of hyperparameters for the kernels, as well as the cost.

Skip to 1 minute and 37 seconds And it will go through trying to find the best model generated from a combination of these hyperparameters. So we see here, linear kernel. There’s only the cost hyperparameter. The radial kernel, we have cost for the SVM and gamma for the kernel sigmoid. Cost, gamma, and coefficient 0. The last two being kernel parameters. And polynomial has four cost gamma coefficient 0 and degree. Again, the last three being kernel parameters. And when we use the tune SVM function, it will examine every combination of hyperparameters for each type of kernel. For the specified type of kernel, and return the best model. It’s going to be doing cross-validation to work out the best value of hyperparameters– the best combination of hyperparameters.

Skip to 2 minutes and 41 seconds Which is why, because it’s got this built in cross-validation, we only need to make a two way split. Now what’s going on here is we’re just setting up the lists of hyperparameters we want to examine. You can read through the notes in this code for a better understanding of how this is all working. Now we’re going to be building a whole bunch of these SVMs of different types of kernels. Radial, linear, sigmoid, and polynomial. Looking for the best instance– best model of each based on all the different combinations of hyperparameters. So let’s run these lines. It’s going to take awhile, because we’re building a lot of models here.

Skip to 3 minutes and 38 seconds Now, of course, we evaluate how these performed on the test data to see which model is best.

Skip to 3 minutes and 50 seconds Let’s just have a look and see which one of these models did end up performing best on the test data.

Skip to 3 minutes and 57 seconds And we find it was a radial kernel with a cost of 7.38. If you look at the various hyperparameter values that we put in, you’ll see that they are all exponents to different numbers. That’s why they’re not nice round numbers here. They’re e to the 1, e to the 2, e to the 3, things like that. OK. So we succeeded in finding our best model from that entire bunch.

Skip to 4 minutes and 24 seconds Oh, sorry. And the scores we were looking at there, it was not from the test data. Sorry. It was from the performance on the cross-validation. Now, of course, we’ve selected our best model. And now we see how that particular model performs on the test data.

Skip to 4 minutes and 43 seconds We can examine the mean classification error here. What we can do– the e1071 package comes with a plot function for support vector machines. It’s quite difficult to interpret, but we’ll have a look at it. Here we go. So we’re getting a slice.

Skip to 5 minutes and 6 seconds In this case, we’re getting a slice where sepal length is 3, sepal width is 2. And then we’re getting petal width versus petal length in the graph. We’re getting the class divisions. Pink being virginica. White being versicolor. And blue being setosa. If we were to change that sepal length and sepal width values in the plot function, we’d get a different slice. We’re just getting a slice of a four dimensional object being displayed in two dimensions, here. OK. So there we have SVMs. For those of you interested, do your best to try to replicate this code.

SVM Exercise 1

The video exercise for support vector machines. The associated code is in the SVM Ex1.R file. Interested students are encouraged to replicate what we go through in the video themselves in R, but note that this is an optional activity intended for those who want practical experience in R and machine learning.

We will create a SVM classifier from the Iris data. We will use the e1071 package, and we will look at the different kernels made available by this package: linear, radial, sigmoid and polynomial. We will use a tuning function from the e1071, tune.svm, package to find good hyper-parameter values for each type of kernel via cross-validation. Accordingly, we will look at how to set up the hyper-parameter sets to be searched for each type of kernel. We evaluate the four produced models on hold-out test data. Finally we look at the (not very attractive) plotting functionality for SVM models that the e1071 package provides and explain how to interpret it.

Note that the datasets, utils and e1071 R packages are used in this exercise. You will need to have them installed on your system. You can install packages using the install.packages function in R.

Share this video:

This video is from the free online course:

Advanced Machine Learning

The Open University