Skip main navigation

Scikit-learn

An introduction to Scikit-Learn for use in the course Machine Learning for Image Data
The Scikit-Learn logo, a blue blob next to a larger orange blob with the words Scikit Learn written on it

Scikit-learn is a popular Python package that can be used to perform a range of machine learning tasks.

It comes ready-installed if you have an Anaconda distribution of Python, but otherwise is easy to install using pip:

 pip install scikit-learn

Once installed, it is designed to be easy to use, with a consistent interface for all the different machine learning models included in the package. This means that once you have your data in the correct format, its easy to switch between different machine learning models appropriate for that dataset, as well as performing other common tasks such as cross-validation and model evaluation.

We’ll talk about these tasks later on in the course, but for now let’s look at the basic steps common to training most machine learning models in scikit-learn.

Basic steps in the scikit-learn API

API stands for Application Programming Interface, and is a term used to describe the way in which two (or more) computer programs communicate with each other. In this context the two programs are Python itself, and the scikit-learn package, so it’s really just a shorthand for talking about the way you use scikit-learn by writing Python code.

The scikit-learn API is designed to have a similar syntax regardless of the machine learning model you choose. In general, to use it you’ll need to perform the following steps in your code:

  1. obtain your data and arrange it into a features matrix (X) and target vector (y)
  2. choose a model by importing the appropriate Python class from scikit-learn
  3. choose model hyperparameters and create an instance of the model class
  4. fit the model to your data using the fit() method of your class instance
  5. use the fitted model to make predictions from new data via the predict() method.

We’ll go through each step explaining any terms you may not be familiar with in more detail as we go on. For now, however, we can summarise the basic way scikit-learn is used further as:

  • get some feature data in the correct format (plus possibly some attached labels)
  • pick a machine learning model and associated hyperparameter values
  • fit the model to the data (or train it)
  • use the fitted model with new data.

Types of models in scikit-learn

Scikit-learn has model classes for all the major types of machine learning model we discuss in this course. Including (but not limited to):

  • supervised learning
    • classification (e.g. Naïve Bayes, decision trees/random forests, K-nearest neighbours)
  • regression (e.g. linear regression, decision trees/random forests)
  • unsupervised learning
    • clustering (e.g. K-means)
    • dimensionality reduction (e.g. principal component analysis).

Other tools in scikit-learn

As well as numerous model classes, scikit-learn also has convenient tools for data preparation, such as quick and easy ways to subdivide data into training and testing sets, as well as data pre-processing techniques such as normalisation and standardisation.

There’s also several tools for model selection and evaluation we will look at, such as cross-validation. We will go through all these concepts and methods using examples as we go through the course.

For more information on scikit-learn, with lots of examples, see https://scikit-learn.org/stable/

This article is from the free online

Machine Learning for Image Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now