## Want to keep learning?

This content is taken from the The Open University & Persontyle's online course, Advanced Machine Learning. Join the course to learn more.
4.6

## The Open University

Skip to 0 minutes and 1 second So now we’re going to look at the first of two example excises for principal components analysis. This is going to be a simple example showing the principal components analysis, PCA, really can work. So let’s turn to the code. We will be working with the Duncan data, which is the careers data set, giving the– We can just have a look at the data set.

Skip to 0 minutes and 33 seconds What we have is income, education, and prestige for a variety of careers, where this was data from 1950s US, and all the values lie between zero and 100. So income is sort of a zero to 100 indication or the average salary in that career, the education, the degree, or number of years of education required to be participating in that career, and prestige is just the claim the career is held in by the community. We’re going to be trying to estimate prestige based on the other features, and the question really is how should we use these other features to estimate prestige. What we’re going to do is we’re going to produce five models.

Skip to 1 minute and 22 seconds They’re all going to be ordinarily squared models because they’re the simplest to generate. The first model will try to estimate prestige based only on income. The second will try to estimate prestige based only on education. The third will try to estimate prestige based on income and education. And then the fourth and fifth are a little bit more interesting. We’re going to be performing principal component analysis, and we will try to estimate prestige based only on the first principal component in model four and the fifth model will try to estimate prestige based on both the principal components.

Skip to 2 minutes and 5 seconds Now, what we’ll do to evaluate these models to see how well they perform is we’ll do all but one cross-validation.

Skip to 2 minutes and 17 seconds Now there’s a whole lot of tips and discussion in the comments in the code. When we’re doing all but one cross-validation with principal components, we actually have to work out the principal components each time we do a cross-validation iteration, because we should only be working them out from the training data, and not from the single holdout test row.

Skip to 2 minutes and 46 seconds At any rate, this code will do exactly what I said just then. It’s going to produce these five different models and give us the all but one cross-validation results for them on this data set.

Skip to 3 minutes and 6 seconds So we now have the all but one cross-validation mean squared error result. And let’s have a look at the performance of the five models. Here we go. Of course, less is better, and we see with some interest, that the best model was when the only feature we used was principal component one. This performed even better than when we used both income and education together.

Skip to 3 minutes and 36 seconds Now why does this work? It’s especially interesting, it works when income and education, either them on their own, are not very good predictors of prestige. But using only the first principal component is a good predictor of prestige. In fact, it’s better than using both income and education together. How can that be? Well, remember that by using only a single feature, in this case a principal component one say, we reduce the complexity of the model compared to using two features, income and education. So we’re going to reduce the variance of the model of the error, when it comes to the bias of variant decomposition.

Skip to 4 minutes and 19 seconds We reduce the complexity of the model by only having a single feature and reduce because of that the variant component of the expected error, and we see that that actually pays off here. Now, obviously, just using one principal component cannot contain as much information as using all the features. But nonetheless, because we managed to reduce, essentially, the overfitting, we end up with a better model. It contains almost all the information of the income and education variables together, but in a simpler model, and possibly without some of the noise. The result is a better estimator of prestige overall. Second interesting thing is income and education performed exactly as well as using both principal components. And in fact, that’s always the case.

Skip to 5 minutes and 11 seconds If you use all the principal components, you’re gaining absolutely nothing over using all the original features. All principal components are is a revolution that revolves the original features. Now, of course, it does more than that as well. It orders the new bases, the principal components by the amount of variants they capture, but if you’re using all of them, then all you end up doing is building your model on a rotation of the original features, and that’s not going to help at all. So there we go, a very simple example that shows the principal components can be effective in improving model performance.

# PCA Exercise 1

The first code exercise for principle components analysis. The associated code is in the PCA Ex1.R file. Interested students are encouraged to replicate what we go through in the video themselves in R, but note that this is an optional activity intended for those who want practical experience in R and machine learning.

A simple example showing that PCA can help the modeling of certain problems. We look at the US Careers dataset, and seek to estimate the prestige of a career based on its income and/or education. We create five OLS models, where the first estimates prestige based only on income, the second prestige based only in education, the third prestige based on both, the forth prestige based on the first principle component of income and education, and the fifth on both principle components of income and education.

These models will be evaluated using cross-validation. We discuss how cross-validation should be performed when working with PCA. We find that the model that estimates prestige based only on the first principle component of income and education is the best performing. We discuss how this interesting result came about in terms of information, model complexity, and the bias-variance trade-off.

Note that the car and utils R packages are used in this exercise. You will need to have them installed on your system. You can install packages using the install.packages function in R.