Skip to 0 minutes and 0 seconds In this presentation, we’re going to look at the basics of data. We’re going to set out what it is, how to describe it, and how to use it.
Skip to 0 minutes and 12 seconds The first thing we’re going to talk about is the different types of data. So a lot of people talk about data, and it being structured, tabular format, or it’s unstructured, untabular format. And that could be considered one of the core differences between different types of data. So we all know data, in the sense of it being in an Excel spreadsheet or some sort of a table. That is data, but also data is everything coming from Twitter, information coming from different sensors, airplanes as they land or take off, and across the health and care sector. We’ve got data coming in as structured and unstructured all the time.
Skip to 0 minutes and 55 seconds We hear leading industry experts talk about the unstructured part of data being worth 80% of all the data we hold as organisations. So it’s everything from emails to pictures. And it’s really important to understand that we need to be able to use both types of data to deliver great outcomes, both structured and unstructured data. Another basic building block of data is to consider whether it’s small data or large data, or as we call it, big data. You often know again, small data will be will be things that you see and use on a daily basis. But big data is a new and exciting world. We often talk about big data being described in four Vs.
Skip to 1 minute and 38 seconds So we have the volume, which is to do with the size of the data. Obviously, examples in the health care sector, where you’ve got millions and millions of patient records. We’re talking about that– big data. And to be honest, we can get a lot, lot bigger than that. So we’ve got volume, and we’ve got veracity. So the speed at which data comes through. That’s specifically important when you start thinking about things like wearable technology, when data is getting recorded and sent in real time. And in that case, you’re getting millions and millions of rows of data from one person every day.
Skip to 2 minutes and 13 seconds We also have to consider the veracity of data, and when it is accurate or not, and the uncertainty of the data that we have as well. So in that we’re talking about the variety of data. So both of these elements, along with the volume and the velocity, need to be understood when we’re doing big data. And big data organisations that can do this can work with all four of these items freely. Another element to consider when we talk about data is the accuracy. We might think that data should be 100% accurate, but it very rarely is. So we need to understand how accurate our data is.
Skip to 2 minutes and 53 seconds We can think about maps being precise, if you look at it from a really high level. But when you zoom in, it feels a whole lot less accurate. So we need to understand just how accurate these sources of data are. We want to be able to keep the data complete as possible. It should be consistent across the collection methodologies, so that everything is the same, and it should be accurate and consistent as well. And this data should all be relevant, of course. And so we talk about data accuracy, we mean all of these elements put together. Another way to check accuracy is, is it trusted by the user?
Skip to 3 minutes and 35 seconds Because if it’s not trusted by the user, it doesn’t matter if it’s accurate or not. So we need to make sure that the users trust the data as well. Also, when we talk about data, we talk about data formats. Data formats can range from PDFs, to movie files, to CSVs, to pictures on the internet. All of these pieces of data come in different formats, and again, we need to be able to understand these formats and use these formats. So you think of your own computer, when you’re using lots of different pieces of data, you might take in something from an Excel spreadsheet. You might take another piece of data in from a real-time stream of data.
Skip to 4 minutes and 14 seconds You want to be able to work with these different pieces of data, to understand them, and to use them together. What’s really important when we’re talking about data, and how you use data, is the idea of data access. So in the health and care sector, we’ve got a lot of personal information, and at no point does anyone suggest that we should be sharing this data unnecessarily, or indeed, publishing it as open data. It’s really important that this data is managed properly and extensively. So we have a data sharing spectrum. And in the data sharing spectrum, we can look at five steps.
Skip to 4 minutes and 51 seconds The first being internal access, where only a limited number of people have any access to the data at all. And this is very important in the health and care sector. But we also want to consider the benefit of sharing that data a little bit more widely. We share it more widely with named people or, indeed, groups of people to enable us to do better analytics on it, to do visualisations, and effectively, to understand the data in context, which can lead to better outcomes.
Skip to 5 minutes and 21 seconds So when data analysts and data researchers, data scientists– they want to get access to lots of volumes of data to improve that statistical accuracy of their data and their outcomes, so that we can deliver better outcomes for the health care sector. So imagine people, when they were analysing cigarette smoking for the first time, and they started looking at the correlation between cigarette smoking and health problems. They needed a lot of data, and they needed to be able to analyse that data to make it available, and then to use that data to create insight. They needed a lot of data, but they had to have access to personal data to securely make it work.
Skip to 6 minutes and 3 seconds We then want to consider, can we share our data as public data? And there’s movements afoot across the world, which look at the idea of open data. Making all the data that we can available for reuse as an asset for other people, so that they can do this analysis and find out new findings. So we want to make some of that data open for people, but obviously not the personal data. And so when we talk about the five steps, we’re moving up from being very, very closed, with potentially less benefits, to very, very open, with potentially much bigger benefits. But obviously, every step of the way, we need to consider to keep personal data private and secure.
Skip to 6 minutes and 42 seconds And when we talk about data, we always talk about metadata. So metadata sounds like a bit of a scary term, but it just means data about the data. So when we’re describing a name, or the format it comes in, or indeed the access type– can you share it? This is all metadata. And that’s all very important when we talk about data. You’ve got to be able to understand the data that you have, and also be able to describe it with the metadata, so other people can understand it and reuse that data.
In this presentation, Steve will talk about the core fundamentals of data: what is it, what data are you collecting, what can be done with it? By the end of this presentation you should be able to understand the different types of data that the Health and Social Care Systems collect and process.
So what is data?
The Wikipedia states that data is “a set of values of qualitative or quantitative variables.” Pieces of data are individual pieces of information. In this section we will consider the fundamental basics of data that are true across the health and care sectors and beyond.
Steve talks about the different characteristics of data including:
- Type of data
- Size of the data
- Access rights
© University of Strathclyde