Skip main navigation

Modelling Real World Data

How do we represent measurements from the real world using data scientific techniques? Lovisa Sundin and Jeremy Singer discuss standard approaches.
JEREMY: Hey, Lovisa. When we’re modelling data in the real world, sometimes it’s a bit confusing to know exactly how to model the attributes. Can you give us an example, maybe?
LOVISA: It’s very true. If we just take these lollipops that I happened to find in my pocket, then we will find out how we choose to model it depends very much on who we are, what our goals are, and what the context is. So if I were a physicist, for example, and I wanted to model the colour of these lollipops, then I know that colour is a physical wavelength, and a continuous attribute.
LOVISA: But if I were a computer scientist, then I would know that these colours would be encoded as discrete bit strings, so it would be a numerical, but discrete, attribute.
JEREMY: Or, if you’re using CSS, you could say colour equals purple, or colour equals red, and they’d be strings.
LOVISA: That’s true. See, it depends on context. But if I were an artist, then I would perhaps, categorise these as different pigments and it would be a categorical attribute. But I think we can all agree that what comes most naturally to us would be to model this as a categorical attribute in terms of the good ones, and the disgusting ones.
JEREMY: Oh, no, blackcurrant is my favourite. Thanks, Lovisa.
LOVISA: Thanks. That leaves me with three.

In general, data is either numbers (numerical data) or labelled values from a limited set of possible values (categorical data).

Numerical data may be discrete (whole numbers) or continuous (real numbers, with decimal points).

As a short exercise, think of an example of numerical data and categorical data for features from the following data sets:

  1. Scotland’s 2001 census data
  2. Bicycles in the Glasgow city bike hire scheme
  3. Olympic Games athletes data

As a further thought experiment, can you see how values of these different kinds might map onto the concrete data types Jeremy introduced in an earlier video?

This article is from the free online

Getting Started with Teaching Data Science in Schools

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education