Skip main navigation

What is machine learning?

In this article you will examine what is meant by 'Machine Learning', how it works and how you can use it to help with your analysis.
© Coventry University. CC BY-NC 4.0

Have you heard of machine learning before? Did you try to figure out what machine learning is? Are you struggling to start?

This article will discuss what machine learning is; we will demystify the term and we will discuss an example of a machine learning algorithm, and why it is such a popular concept.

What is machine learning?

Machine learning always starts with a dataset and a question. This is the essence of data science. As data scientists, we always make a hypothesis that our question can be answered from our data, such certainty can only be attained once we mine the data set for potential answers, and once we have some certainty that we can extract hidden knowledge from the data.

So, machine learning is the process of extracting knowledge from a dataset based on automation so computers can learn from data and answer our questions. In our case, automation refers to developing software with language such as Python to support our analysis.

Machine learning example

To explore machine learning, let us start with a simple example. Assume that you want to create an email spam filter where a machine learning program will make decisions on given input and identify if the input email is spam (‘bad’) or ham (‘good’).

We already have a small dataset to start our analysis that consists of our training set, where each data point is called a training instance or sample.

The machine learning program will learn from the training dataset how to classify a new email into spam or ham based on the ‘experience’ encoded in the training set. In practice, to identify spam, the program will learn to identify words and phrases that are typical to spam emails, such as ‘credit card’, ‘free gift’, ‘amazing offer’, and others.

So, how does machine learning work?

An overview of machine learning techniques

The machine learning technique that we will use, will learn which words and phrases are good predictors for spam or ham emails based on frequent words patterns.

The process includes:

  • Studying the problem and identifying the key features to use such as incoming email address, email text body, and other
  • Training a machine learning model to classify if an email is spam or a ham
  • Evaluating the solution and analysing the errors, for example, how many ham emails were recognised as spam (false positives), or how many spam emails were recognised as ham (false negatives)?
  • Starting the machine learning program to flag emails without any intervention, hoping that our model will work efficiently with a low error rate

Applications of machine learning

It is evident that the more data we have, the better the training of our machine learning model will be, and thus we will have fewer errors in predictions. To summarise, with machine learning we gain an understanding of the data, we are able to extract knowledge from the data, and crucially we can create programs to learn from the data.

A few machine learning examples include:

  • Detecting credit card fraud by identifying outliers (data points that look anomalous for our dataset) in a huge dataset
  • Forecasting a company’s revenues for the next year based on the revenues of the past ten years
  • Detecting healthcare problems by analysing images such as an x-ray image
  • Creating intelligent bots for answering questions on automated chats and games
  • Analysing text using natural language processing to classify text and understand its contexts such as the example of identifying spam and ham emails

Types of machine learning

There are four main categories of machine learning (the first two were already mentioned in Week 1):

  • Supervised learning is a method where a labelled training set is used by the machine learning algorithm to construct the model. The spam and ham email example is a typical case of classification where the training set provides many labelled examples for the algorithm to learn how to classify a new email based on the existing one.
  • Unsupervised learning is a method where there is no labelled training data ie, the dataset is unlabelled. For example, you have a dataset of people living in London, and you want to classify their salary based on their location, ie, on longitude and latitude. This is an example where clustering the data and employing a visualisation technique can create multiple clusters of people per London area according to their salary. In this case, we usually say ‘we feed’ our model with unlabelled data, and the model magically create the clusters.
  • Semisupervised learning is a method where the labelled training set is sparse and a combination of supervised and unsupervised learning is used to build a model. A good example is Facebook image tagging. Let’s suppose that you upload an image with a friend called Mary on Facebook, and then you tag the image with her name. The next time you upload a photo with the same friend, Facebook’s machine learning algorithm will automatically label the image with Mary’s name. In this way, semisupervised learning combines both (unsupervised and supervised) methods to perform the analysis.
  • Reinforcement learning is a method where a program observes the environment by performing actions and getting feedback through rewards or penalties. A good example of this is DeepMind’s AlphaGo program. The AlphaGo program plays the board game Go by analysing a massive amount of game data and then playing against itself, using a reinforcement learning algorithm.

If you’d like to learn more about machine learning, check out the full online course, from Coventry University, below.

© Coventry University. CC BY-NC 4.0
This article is from the free online

Applied Data Science

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education