Learn more about this course.

Increasing the complexity of our modelling

What is a vector? How does this enable us to build more complex data models? Find out in this article.

We should take a step back here and reflect on how we built our models: we first selected a subset of our data to work with and then grouped the data according to these values.

Notice that if the selected data for two paintings is the same, then our models will invariably label them in the same manner. Two paintings with the same ratio will always receive the same label by the ratio model and two paintings that contain the same keyword in their title will always receive the same label by the title model.

Furthermore, we expect that most of the time our models to label ‘similar’ data with the same label. In that sense, our models describe which paintings are similar and which ones are dissimilar and decide labels according to this measure.

This idea is what lets us build more complex models and draw on powerful existing techniques. In order to apply them, we first need to translate our data into so-called vectors. Fundamentally, a vector is a collection of numbers or entries, not unlike a row in table. The number of entries is the dimension of the vector. It is sometimes useful to imagine vectors as points in space, although this intuition fails for more than three dimensions. For example, the following business locations in London can be interpreted as two-dimensional vectors on a two-dimensional plane (we can ignore the earth’s curvature for such small scales):

Want to keep
learning?

This content is taken from
Coventry University online course,

Applied Data Science

View Course

Click image to expand.

Some data is already naturally represented as vectors, like GPS coordinates which consist of two entries, latitude and longitude. Other, more complex, data can sometimes be mapped onto vectors (in this case often called features) which captures some of its aspects.

In either case, the notion of similarity or distance is crucial: we can take two vectors and compare them. In this example, the distance between two vectors will most likely be the natural geographical distance in miles or kilometres, which is also called the Euclideian distance. We discuss other types of metrics in the next step.

Want to keep learning?

This content is taken from Coventry University online course

Applied Data Science

View Course

See other articles from this course

This article is from the free online

Applied Data Science

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Increasing the complexity of our modelling

Want to keep
learning?

Applied Data Science

Want to keep learning?

Applied Data Science

Applied Data Science

Applied Data Science

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Increasing the complexity of our modelling

Want to keep learning?

Applied Data Science

Want to keep learning?

Applied Data Science

Share this

Applied Data Science

Applied Data Science

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?