Skip to 0 minutes and 1 second In this presentation, we’re going to look at what data analytics is. And in particular, we’re going to look at two important techniques, those of machine learning algorithms, to look at the kind of insights that we can generate automatically from the data, and also look at data visualisation techniques. Those are techniques for visualising both the raw data and the insights that are generated by the algorithms. Data analytics is the area of data science which is concerned with automatically discovering insights from the data. And what we do is we use machine learning algorithms in order to find patterns in the data which aren’t obvious.
Skip to 0 minutes and 51 seconds A good example of this would be where you’ve got a large data set concerning subjects and you’ve got a risk factor associated with them. And you want to somehow find out how that risk factor is correlated with the rest of the data. Now, it might not be an obvious correlation. It might not be that you can pick out one particular element of the data to find that risk factor. It might be a subtle combination of elements. And that’s what machine learning algorithms can do for you. What we do in general is that we present the machine learning algorithm with a collection of data and ask it to find patterns in that data.
Skip to 1 minute and 36 seconds There are three particular types of pattern that we’re interested in. The first one is called regression. And this is where one variable, a numeric value, relies on lots of other variables in the data. So what we’re trying to learn is what is going to be this numeric value for a data point that I haven’t seen before? And there are many different ways in which we can do that regression, and it may be the relationship between the variable that we’re concerned with, and all the other variables is quite a complex one. Or it could be a very simple one, such as a simple straight line. The second type of machine learning algorithm is concerned with classification.
Skip to 2 minutes and 27 seconds This is where there is a distinct value in the data– what we call a label. It could be something as simple as this person is at risk, this person is not at risk. So it’s one of a number of values. And what we’re trying to do is to predict that value from the other data. So what we do here is we start with a collection of labelled data and we learn a model, which allows us to infer that label for unseen data. And the last method is called clustering. This is where we have a large collection of data, but we don’t have any labels on the data.
Skip to 3 minutes and 6 seconds What we’d like to do is try to find some labels by looking at where the data are tightly clustered. So it could be that we have a particular cluster in our data and we want to find that cluster and maybe give it a label.
Exploring and modelling data
In this presentation, Dr John Levine will provide his thoughts on exploring and modelling data in the health and care sector.
© University of Strathclyde