We are talking about generating insight from our data. The reason to collect data in the first place is to find something meaningful in the data to help us solve our problems. So we’ve got a five-step process to help us generate insight from our data. Acquiring data, exploring and pre-processing that data, analysing the data itself, communicating results, and turning these findings into action. So in these five steps we’re going to go through, it’s all about pulling out what’s important in the data itself to create something meaningful at the end. So our first step about acquiring data.
There will be a lot of data already in your organisation, or for the organisation that you work for, that you want to get access to to help you solve your problems. So you’re going to need to get access to that data and understand in what rules you can use it. So that’s called licencing. So we’re going to look at understanding how you can use that data, and that will confine your use of it. You’ll also look to explore other data sets, external to your organisation. So they might be from national governments, or other bodies, or potentially other published pieces of data research.
So as you go about acquiring your data– remember you’re looking at getting a large body of data so that you can work with it, and understanding how you can use those data for your licences. It’s important that you understand your licences, and what enables you to be able to use that data. The next step then, exploring and pre-processing your data. This is going to take a long time any data-led project. In fact, some people say this is probably the largest step and takes the most time out of anything. You’ll want to make sure that your data is clean, so that you can use it.
And in some cases, you might need to explore every field to make sure that the data you’ve got is accurate and usable. You might have to clean some data out, so that you’re removing variables that are not accurate, or variables that do not help. And you’ll also look to do other types of pre-processing, such as getting different data sets into the same format. So say we’re working with something as simple as Date. That could be recorded in day-day, month-month, year-year, or it could be recorded month-month, day-day, year-year. And what you need to do then, is to get that into the same format. So a little bit of pre-processing.
And you can think how that could also work from spatial reference systems as well. So we want to make sure we’re working with the data in the same format for the different areas that we’re talking about. So that’s all the boring bit out of the way. Now we’ll look to try and analyse the data. And quite often in the analytics process, you’ll be doing some exploratory work. I suppose you’ll be thinking about what’s to come, and doing some testing, looking through the data to identify patterns, or things that might be interesting to you.
You’ll often be looking back to your research question, or indeed, the question that you’ve been set as a business, or as an organisation, to try and understand what you’re trying to get from the data. So for example in health and social care, if we’re looking to try and reduce asthma attacks in infants, for example, you’ll obviously be concentrating on the element of asthma attacks as one of your key variables, and you’ll be looking for external variables, potentially, such as, let’s say, temperature, to help you identify what’s causing asthma attacks, and therefore, how to prevent them. So as you go through your analysing processes, you’ll be looking to tease out these different components.
Primarily, you’ll be looking at statistical analysis when it comes to health and social care. You’ll be looking to get something that is accurate, and therefore a true representation and it can evolve– it can be part of the solution, effectively. So to undertake true statistical analytics, you need to have enough data to enable you to get the right answer. In statistical analytics, you’re looking at describing the nature of the data to be analysed, which we talked about earlier in this lecture, and exploring the relation of the data to the underlying population. So that’s the statistical part. We’ll then look to create a model to summarise our understanding of the data.
And it is that piece where you create the model of the data and to try and prove something is accurate, or prove you find something interesting in the data that we really want to get correct. And then you’ll look to prove or disprove the validity of that model. And that’s the piece where we’re trying to effectively assess our original idea to decide if that’s correct or incorrect. And in some cases, quite often, it might be incorrect. You might want to build something into your model. You might want to revisit the start of the model. Or you might want to add to that model.
So say you found something very interesting, but is that the reason for it being accurate, or is there some other factor that you might bring into play? So when we talked about the asthma example, we talked about weather conditions. But let’s say it’s something to do with the humidity in the home that’s actually causing asthma attacks. That might be something else that we need to factor into our consideration. And once we’ve gone through this idea of solving a problem, or identifying how we can solve the problem, we then need to, really importantly, communicate results.
And this is, I think, a lot of times in analytics where we stop because we found something interesting, and we don’t go about communicating results very well indeed. So when we talk about this, we’ll be looking to try and visualise this data, or this finding, in a compelling visualisation, so that we can tell a story effectively to our stakeholders and to the general public about what we found, and what we believe the solution would be. So it’s really important here to engage with experts in communicating results visualising the information that we have and sharing that properly.
If we do that in a compelling way, we’ll get additional stakeholder buy-in, we’ll get positive relationships, with the general public and our users, effectively, in this area. And then we can start turning findings into action. So a great example from health and social care, was looking at the types of drugs that were prescribed on the NHS in England, and through analysing open data, organisations were able to find that they could reduce the bill by doctors prescribing non-proprietary drugs. They could reduce the bill of the NHS by about 1 billion pounds per year. But that’s easy enough to prove in an analytical model, and to communicate that result. But how do you then follow that through to turn that into action?
We actually have to generate a story that helps a practitioner change their method of thinking, change their behaviour. So we need to take this information to the doctors to say, we could save the NHS one billion pounds a year, if you change your prescription habits. And that then enables the change to be fully occurred. And we’ve gone through the process, analysing our data, and taking it all the way to the conclusion, which is a positive benefit for everybody. And that concludes our look at turning data into insight. We’re using the raw material of data and we’re doing five different steps in that process to actually create something that’s usable and positive.