Skip main navigation

Relationships and patterns

When we come to interpret our data, what conclusions can we draw? In this article, Jeremy Singer discusses relationships between variables.
Scientists with a test tube
© University of Glasgow
It is natural to look for relationships and patterns to emerge from a data set. While there are formal, statistical tests and machine learning techniques to establish such relationships, we are only going to consider informal, non-automated approaches at this stage. Generally, we want to look at two features in a data set, to try and identify some kind of pattern or correlation between them.
If both variables are numerical, we can draw a scatter plot as described in earlier steps. We might try to draw a trend line, as known as a line of best fit. There is a positive correlation when one feature value increases and so does the other feature value. For example, when the length of a train journey increases, then the ticket prices also increases. There is a negative correlation when one feature value increases and the other feature value decreases. For example, when the average daily wind speed increases, then the use of fossil fuels in Scottish power stations decreases.
If both variables are categorical, we might examine the data with the help of a contingency table. We should try to look for distinctions between the category combinations. For example, are people who live on their own more likely to own a pet? This kind of analysis is similar to calculating conditional probabilities in Maths.
Finally, when one variable is numerical and the other variable is categorical, we should calculate the median value for each category, along with a measure of the dispersion within the category. Are there significant differences for some categories? For example, do young people spend more time watching Youtube videos than older people?
What are we looking for, really? We want to discover something interesting. We might come to the data set with some intution about likely findings, or we might explore with an open mind. Either approach is fine, but we need data to back up any interpretation we might make.
© University of Glasgow
This article is from the free online

Getting Started with Teaching Data Science in Schools

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education

close