3.29

# Summary for Week Three

We’ve reached the end of week three! This week we looked at kernel methods for supervised learning, as well as a variety of unsupervised learning techniques - a number of clustering methods as well as LDA. Major things that you should make sure you learnt this week are:

#### 1. Kernel Regression, the Kernel Trick and the Representer Theorem

You should understand what kernels are and what the kernel trick is and, at a basic level, how it works and what sorts of models it can be used with. You should also understand how it can be applied to dual form linear regression (since this is a powerful and paradigmatic case) and the mathematical guarantee that comes with it in the form of the representer theorem. You most certainly do not need to understand the mathematics that lead to the kernel trick or representer theorem! You should also understand why kernel methods do not generally scale well to big data.

#### 2. Support Vector Machines

You should know what support vector machines do in terms of concepts like an optimal separating hyperplane, and know how to train them to data. You should know what sort of hyper-parameters are involved in training an SVM, and be aware of the peculiar case of using a ‘linear kernel’.

#### 3. Gaussian Processes

You should know what Gaussian processes do, and how they do it. You should be able to train them to data, including having a knowledge of the decisions that need to be made, and the hyper-parameters that need to be specified (and what they control). You should be able to explain the outputs that a Gaussian process provides when applied to new data.

#### 4. Basic Clustering

You should understand the differences between the various basic types of clustering algorithms covered, such as sequential clustering, hierarchical clustering, graph clustering and optimization based clustering. You should also understand related concepts such as proximity measures, proximity graphs, dendrograms, etc.

#### 5. Gaussian Mixture Models and the Expectation Maximization Algorithm

You should understand what GMMs are, what they do, how to apply them to data, and what they produce. You should also know their draw-backs. Further you should understand the idea behind the EM algorithm - we will see it again!

#### 6. Dirichlet Process Mixture Models

You should understand what Dirichlet distributions and Dirichlet processes are, and how they can be used in combination with Gaussian distributions to produce a Dirichlet Process Mixture Model. You should understand the relationship between such a model and GMMs.

#### 7. LDA and Metropolis within Gibbs MCMC

You should understand what LDA does, in the sense of how it models documents, topics and words, and how it uses Dirichlet distributions as priors for the document and topic models. You should also know what information can be obtained from a fitted LDA model. Finally, you should understand the Metropolis within Gibbs MCMC algorithm - we will also see this algorithm again!

Only one more week to go now. We hope you are still enjoying the course, and that you are learning lots of interesting and useful techniques. Take a little rest and we’ll see you again in the final week!