Skip main navigation

Assessing your data

For example: “How do drug concentrations in our blood affect how we think and feel? In this article, Dr Kyle Dyer discusses his recent research.”

Data is at the heart of our analysis. Now that we have our business problem and data set, in this step, we’ll look at types of data and think about what a high quality data set looks like.

Types of data

There are two broad types of data:

  • Structured when the whole data set is organised. There may be a mixture of text and numbers in the data, but there is a well-defined arrangement of the different ‘fields’, or categories of information. The most common structured tables we find are databases: we have tables with columns separating different information types (fields) and rows indicating different instances of that information (records).

Here’s a familiar sight of some structured data:

this image is of a spreadsheet with data organised under column headings. The columns headings are country names, such as Belgium, and there is numerical data in each column

  • Unstructured when there is no clear organisation of the full data set. Or only part is organised, leaving other fields undefined. A Word document, a PowerPoint presentation and an email are all forms of unstructured data. It may be easy for a human to read an email and understand it, however it proves difficult to analyse it on a computer because we don’t know what each bit of the information is describing until we have read it.

Below are two examples of unstructured data:

An email in text form and a set of Tweets related to the topic #datascience

this image contains a twitter thread of a series of comments on the right, and next to it an email

this image is of rows and rows of extracted twitter data

For the rest of the course, we’re going to focus on working with structured data. Most of the databases and Excel spreadsheets you’ll come across will contain structured data. But it’s good to be able to identify unstructured data, as sometimes you may receive files of unstructured data that need more work to turn them into a structured format.

Data quality

The quality of data affects the quality of our analysis and of our outcomes. If we have poor data that doesn’t suit our needs we won’t be able to draw any conclusions from it. Later we’re going to cover how to fix and tidy up data sets once we’ve found them. But before that, we’ll look at two criteria to bear in mind when we’re assessing our data set.

Appropriate

Our data needs to fit our needs. If we want to analyse how global national GDP has changed over time, we need a data set that contains the relevant data to answer that question. For example, at a minimum, it would need columns for country, date, and GDP. If we want to want to assess the year-on-year change of GDP but the data only contains GDP per decade then that data set is not appropriate to answer our question. We can either look for a new data set or refine our question.

Quality

The data needs to be accurate and complete for our task at hand, no data set will describe the world perfectly, but we need to make sure it’s complete enough to answer our chosen question. For example, if we’re looking to analyse global GDP data and we’re missing data for North America and Oceania then we won’t be able to perform a global analysis. Equally, if some GDP data is entered in currency and some data is entered as errors, special characters, or is missing (#NA), then our data is incomplete and of not high enough quality to perform a decent analysis.

In sum, if we want to perform good analysis in Excel, we need to make sure our data is structured, is appropriate to tackle our question, and is good quality.

Activity:

Using the Excel file from the previous step, open it in Excel and try to describe it, share your thoughts in the comments.

  • Does it contain structured or unstructured data?
  • Do you think it contains appropriate data to answer our question?
  • Can you say anything about the quality of the data?

Don’t worry if you’re not sure, at this stage it’s just important to be thinking like a data analyst: asking questions and trying to answer them.

This article is from the free online

Analysing Data in Excel

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education