Skip main navigation

Relationships and patterns

When we come to interpret our data, what conclusions can we draw? In this article, Jeremy Singer discusses relationships between variables.
Scientists with a test tube
© University of Glasgow

It is natural to look for relationships and patterns to emerge from a data set. While there are formal, statistical tests and machine learning techniques to establish such relationships, we are only going to consider informal, non-automated approaches at this stage. Generally, we want to look at two features in a data set, to try and identify some kind of pattern or correlation between them.

If both variables are numerical, we can draw a scatter plot as described in earlier steps. We might try to draw a trend line, as known as a line of best fit. There is a positive correlation when one feature value increases and so does the other feature value. For example, when the length of a train journey increases, then the ticket prices also increases. There is a negative correlation when one feature value increases and the other feature value decreases. For example, when the average daily wind speed increases, then the use of fossil fuels in Scottish power stations decreases.

If both variables are categorical, we might examine the data with the help of a contingency table. We should try to look for distinctions between the category combinations. For example, are people who live on their own more likely to own a pet? This kind of analysis is similar to calculating conditional probabilities in Maths.

Finally, when one variable is numerical and the other variable is categorical, we should calculate the median value for each category, along with a measure of the dispersion within the category. Are there significant differences for some categories? For example, do young people spend more time watching Youtube videos than older people?

What are we looking for, really? We want to discover something interesting. We might come to the data set with some intution about likely findings, or we might explore with an open mind. Either approach is fine, but we need data to back up any interpretation we might make.

© University of Glasgow
This article is from the free online

Getting Started with Teaching Data Science in Schools

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now