Skip to 0 minutes and 2 seconds OK. So now we’re going to look at the GMM, the Gaussian mixture model example exercise. You’ll find that in GMM space EX1. Now there’s not a lot to talk about before we get into things so let’s jump straight to the code. We’ll prepare the data and the packages we’ll use. This time we’re working with the iris data again, which you’re probably familiar with, the three different species of flowers in here and this petal and sepal characteristics as features. Now we’re going to be doing a Gaussian mixture model, trying to cluster on the features.
Skip to 0 minutes and 44 seconds Since we know the three species and we can expect that they’re the most salient clusters there, we’ll first of all split the data into features and target, or features and species collections. So I make a data frame with four columns for the features containing the sepal and petal characteristics and a data frame with a single column with the species. And now we’re going to use a Gaussian mixture model to cluster on the features. We’ll look for three clusters. And we’ll see if it manages to recreate, rediscover, the species clusters as we expected they’re the most salient. Here we’re using the mclust library and function. So there we go. We’ve got our model already. We can plot our model.
Skip to 1 minute and 51 seconds Now what we have here is a four by four set of cluster plots with the different features plotted against each other. And you can see the clusters from each of these perspectives. You see the Gaussian distributions plotted on these plots as well as the cluster classification of the various data points.
Skip to 2 minutes and 21 seconds So all very, very easy there. Let’s actually see if it did manage to capture the species clusters. We’ll just make a matrix, essentially, with the columns and rows showing the clusters identified by the mclust algorithm, the GMM algorithm, as opposed to the species classifications of the features.
Skip to 2 minutes and 48 seconds Let’s have a look at that matrix here. And you see indeed it has managed to rediscover– recapture the species clusters very, very well. A little bit of confusion and cluster too, it’s not quite the versicolor species. It includes five of the Virginica cases, but otherwise it looks very much like the clusters that the Mclust algorithm found were the species of flowers, the species of irises.
Gaussian Mixture Models Exercise 1
A video exercise for Gaussian mixture models. The associated code is in the GMM Ex1.R file. Interested students are encouraged to replicate what we go through in the video themselves in R, but note that this is an optional activity intended for those who want practical experience in R and machine learning.
In this exercise we perform clustering on the Iris data, using a Gaussian mixture model (GMM) using the Mclust package. Assuming that the species are the most salient clusters in the data, we want to see if a GMM succeeds in learning these species clusters. We also see the plot functionality for a GMM model provided with the Mclust package.
Note that the mclust R package is used in this exercise. You will need to have it installed on your system. You can install packages using the install.packages function in R.
© Dr Michael Ashcroft