Skip main navigation

Your model and the truth

What are the limits of your data model? Find out in this article why it is important to document and explain your data analysis.
© Coventry University. CC BY-NC 4.0

We have built three different models for our Tate dataset to figure out how we can, in general, determine whether a painting is a landscape, a portrait, or some other motif.

Of course, given the simplicity of the models and the fact that none of our models actually takes the painting itself into consideration, we would never be tempted to say that our model knows what a landscape and a portrait are.

When it comes to more complicated models and methods, however, it is easy to forget that in the end, any model only describes a certain dependence between the training data’s features and its labels. Since we would like to ascribe a certain generality to our model—extending beyond the scope of the training data on to some real-world ‘wild’ data—it is all the more difficult to appreciate its limits.

Though, you might ask, what’s the harm? There are two important points to keep in mind here: 1) Most people regard computers as neutral decision-makers and 2) algorithmic decision-making is becoming ubiquitous. Consequently, many important decisions made about us – whether a credit application is approved, whether we’re eligible for certain social support, whether our CV makes it into the next round of a job application, etc – may be delegated to automated inscrutable systems.

A discussion of where biases can be and are introduced in these potentially vast systems is by far outside the scope of this course, but we should have a closer look at bias in data. For example, the dataset we used this week has a geographical and cultural bias since it came from a British museum. It will additionally reflect certain preferences of the curators that worked on the collection. We chose to only look at oil paintings, which certainly carries with it a certain selection bias, and we only used a certain subset of the data to build our model.

All of these little choices can influence the makeup of our data. There is no simple way to obtain neutral data, which is why we have to be cognisant of our choices and make them transparent in our analysis. Data analysis should therefore not simply consist of mathematics and program code, it needs to be documented and explained. This is one of the reasons why notebooks have become a popular choice to communicate such work.

Your task

Which areas of your life do you feel are influenced by data-driven decisions? Do you think that these decisions are fair?
Please respond to this question in the comments.
Read through a few of your fellow learners’ responses, can you see any patterns or trends?

Further information

We will be covering ethics in data science in week two but if you would like to find out more about this topic (as well as data science in general) now you can check out the following resources:

GOV.UK. (2018, June 13). Data ethics framework. Web link

DataKind. (n.d.). Harnessing the power of data science in the service of humanity. Web link

Caroline Criado Perez. (n.d.). Books. Web link

Towards Data Science. (n.d.). A medium publication sharing concepts, ideas, and codes. Web link

© Coventry University. CC BY-NC 4.0
This article is from the free online

Applied Data Science

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education