Skip main navigation

Algorithms to Live By

An article introducing algorithms and their role in data science.

To start with, there is plenty of data science algorithms to use and they come from various backgrounds such as machine learning, statistics, mathematics, management science, etc. Generally speaking, algorithms fall into four categories.

four categories of algorithms

Supervised: The goal behind this is to learn classifiers from known examples or datasets (labeled documents) to perform or apply the classification automatically on unknown examples or datasets (unlabeled documents). In other words, supervised algorithms mean learning from examples. There exists plenty of algorithms that fall into this category. For example, support vector machines (SVM), k-nearest neighbor (k-NN), naïve Bayes classifier (NBC), random forests (RF), regression, logistic regression, etc.

Unsupervised learning: The dataset is not labeled at any point in the whole process. Unsupervised also requires training data, but training data is unlabeled. Same as in the previous class, there also exists plenty of algorithms that fall into this category. For example, k-means clustering and the Apriori algorithm.

Semi-supervised: This is mostly applied to both labeled and unlabeled data. Also, they can learn from incomplete information or a missing training set, which the algorithm still needs to learn from. It can also be applied when it is necessary to categorize a large amount of data, where few are labeled.

Reinforcement learning: The algorithms have three components:

  • the agent (the algorithm itself) which makes decisions
  • the environment, meaning everything the agent deals with
  • the actions, meaning what the agent can do when interacting with the environment.

The goal is to make decisions that maximize gains over time. For example, consider an algorithm in the computer field that plays games against an opponent. The moves that lead to victories (positive feedback) in the game should be learned and repeated while those that lead to losses (negative feedback) should be avoided. Examples of algorithms of this type include artificial neural networks (ANN) and Markov Chains.

The reasons we use algorithms are countless. For example, we use algorithms for:

improving healthcare, support data-driven decisions, understand societal issues, address climate change problems, discover patterns and trends, study evolution, identify outliers, predict machine failures and make future forecasts

We normally use several algorithms to solve a single problem. Yet, it is still possible for a class for problems to be addressed or solved using a single algorithm.

© Luleå University of Technology
This article is from the free online

Data Science for Climate Change

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now