Skip main navigation

Analysing different types of data

Analysing different types of data
© Coventry University. CC BY-NC 4.0

Data is a means to an end. Be aware of the actual target of your analysis, the endpoint you’re aiming for.

As emphasised in Week 1, we need to ensure our research question is well defined, as this will help us target our data collection and analysis. Let’s look at an example and consider how the data collected relates to the research question that is being sought to address.

Example: WHO contact tracing

The World Health Organization (WHO) proposes contact tracing as a means to control the COVID-19 outbreak. Contact tracing involves gathering data about individuals and their peers.

Earlier this week, we looked at types of structured data and how they can be plotted. When collecting data, we need to keep in mind how we can turn it into structured data and what information the data actually contains. Data discussed in this step will be structured using the concepts we studied in the earlier Step 2.8: Representing concepts mathematically.

The data gathered is of different types: categorical variables such as names, and quantitative variables such as exposure frequency.

Unique ID Age Full name Fever
1 42 Alishia Ford y
2 20 Morgan Derrick y
3 65 Ronny West n

In Python, this data looks like this:

data = {
1: {'Age': 42, 'Full Name': 'Alishia Ford', 'Fever': True},
2: {'Age': 20, 'Full Name': 'Morgan Derrick', 'Fever': True},
3: {'Age': 65, 'Full Name': 'Ronny West', 'Fever': False}
}

With the above data, we may answer a number of questions. For example, let’s compute what percentage of our data subjects reported ‘fever’ as a symptom:

fever = [key for key in data if data[key]['Fever']]
fever_percentage = float(len(fever)) / len(data)
print fever_percentage
0.666666666667

We first select the data subjects that exhibit fever, then compute the ratio of those with a fever over the total number of data subjects, computing the answer as 66.7%.

Another piece of information contained in our data are names. We can ask: what is the average length of first names? To find out the answer to this question, we’d write the following:

names = [data[key]['Full Name'].split() for key in data]
first_name_lengths = [len(name[0]) for name in names]
avg_length = float(sum(first_name_lengths)) / len(first_name_lengths)
print avg_length
6.0

Here we have created a list of names, where each list element is a list of words (the names).

  1. The first names will be the first element of each list of words (the Python list uses index 0 for the first element).
  2. We then create a new list (called first_name_lengths) of the lengths of the first names, to make it easier to read.
  3. This then enables Python to compute that the average length of first names is 6.

Starting from data gathered for the purpose (answering the research question) of contact tracing, we’re able to compute answers to further questions. This situation frequently arises, and we need to consider the following situations:

  • We have gathered data and still cannot answer the original question. For example, we do not know how the data subjects relate to each other, and thus cannot necessarily trace contacts.
  • We have gathered data, and are now able to answer additional questions. As shown above, answering data questions about names is possible – which need not relate at all to the question of contact tracing.
  • While gathering data, we have found additional information that must not be included. For instance, the data subjects’ names may raise privacy concerns.

It is important to be aware of these situations when processing data.

References

World Health Organization. (2020). Contact tracing in the context of COVID-19. https://www.who.int/publications/i/item/contact-tracing-in-the-context-of-covid-19

© Coventry University. CC BY-NC 4.0
This article is from the free online

Get ready for a Masters in Data Science and AI

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education