Skip main navigation

Exploring and modelling data

Exploring and modeling data
0.5
In this presentation, we’re going to look at what data analytics is. And in particular, we’re going to look at two important techniques, those of machine learning algorithms, to look at the kind of insights that we can generate automatically from the data, and also look at data visualisation techniques. Those are techniques for visualising both the raw data and the insights that are generated by the algorithms. Data analytics is the area of data science which is concerned with automatically discovering insights from the data. And what we do is we use machine learning algorithms in order to find patterns in the data which aren’t obvious.
50.9
A good example of this would be where you’ve got a large data set concerning subjects and you’ve got a risk factor associated with them. And you want to somehow find out how that risk factor is correlated with the rest of the data. Now, it might not be an obvious correlation. It might not be that you can pick out one particular element of the data to find that risk factor. It might be a subtle combination of elements. And that’s what machine learning algorithms can do for you. What we do in general is that we present the machine learning algorithm with a collection of data and ask it to find patterns in that data.
96
There are three particular types of pattern that we’re interested in. The first one is called regression. And this is where one variable, a numeric value, relies on lots of other variables in the data. So what we’re trying to learn is what is going to be this numeric value for a data point that I haven’t seen before? And there are many different ways in which we can do that regression, and it may be the relationship between the variable that we’re concerned with, and all the other variables is quite a complex one. Or it could be a very simple one, such as a simple straight line. The second type of machine learning algorithm is concerned with classification.
146.9
This is where there is a distinct value in the data– what we call a label. It could be something as simple as this person is at risk, this person is not at risk. So it’s one of a number of values. And what we’re trying to do is to predict that value from the other data. So what we do here is we start with a collection of labelled data and we learn a model, which allows us to infer that label for unseen data. And the last method is called clustering. This is where we have a large collection of data, but we don’t have any labels on the data.
186.1
What we’d like to do is try to find some labels by looking at where the data are tightly clustered. So it could be that we have a particular cluster in our data and we want to find that cluster and maybe give it a label.
In this presentation, Dr John Levine will provide his thoughts on exploring and modelling data in the health and care sector.
This article is from the free online

The Power of Data in Health and Social Care

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education