Skip main navigation

Increasing the complexity of our modelling

What is a vector? How does this enable us to build more complex data models? Find out in this article.
© Coventry University. CC BY-NC 4.0

We should take a step back here and reflect on how we built our models: we first selected a subset of our data to work with and then grouped the data according to these values.

Notice that if the selected data for two paintings is the same, then our models will invariably label them in the same manner. Two paintings with the same ratio will always receive the same label by the ratio model and two paintings that contain the same keyword in their title will always receive the same label by the title model.

Furthermore, we expect that most of the time our models to label ‘similar’ data with the same label. In that sense, our models describe which paintings are similar and which ones are dissimilar and decide labels according to this measure.

This idea is what lets us build more complex models and draw on powerful existing techniques. In order to apply them, we first need to translate our data into so-called vectors. Fundamentally, a vector is a collection of numbers or entries, not unlike a row in table. The number of entries is the dimension of the vector. It is sometimes useful to imagine vectors as points in space, although this intuition fails for more than three dimensions. For example, the following business locations in London can be interpreted as two-dimensional vectors on a two-dimensional plane (we can ignore the earth’s curvature for such small scales):

"A map of central London, with mauve points describing latitude and longitude coordinates of businesses in London". Click image to expand.

Some data is already naturally represented as vectors, like GPS coordinates which consist of two entries, latitude and longitude. Other, more complex, data can sometimes be mapped onto vectors (in this case often called features) which captures some of its aspects.

In either case, the notion of similarity or distance is crucial: we can take two vectors and compare them. In this example, the distance between two vectors will most likely be the natural geographical distance in miles or kilometres, which is also called the Euclideian distance. We discuss other types of metrics in the next step.

© Coventry University. CC BY-NC 4.0
This article is from the free online

Applied Data Science

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education