Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Max pooling layers

A short article describing max pooling layers in convolutional neural networks

A recap of max pooling, what it does, and why we might need it.

As we have seen in the previous videos and articles, the output of convolutional layers, especially when dealing with relatively large image datasets can quickly become enormous. There can potentially be hundreds of convolutional output channels corresponding to original images containing many thousands (if not millions) of pixels, plus the images themselves are often processed in large batches.

Why do we need max pooling?

There are two reasons why reducing the size of the data passing through the network is useful. Firstly, and perhaps most obviously, after a point the dataset just becomes too large to deal with sensibly, even with GPU acceleration. The second reason relates to the motivation behind much of deep learning, and that is to extract a small amount of information, perhaps just a single number, from a dataset with a lot of spatial information but just a few input channels (a million pixel RGB image for example). By using max pooling, as we increase the number of channels via convolutions, thus increasing the information in one dimension, we also want to decrease the amount of information in the spatial dimensions, so that it becomes easier to make classification decisions or produce values for a regression.

How do max pooling layers work?

The good news is in comparison to convolutional layers, max pooling layers are relatively simple. Essentially a max pooling layer divides the image up into very small blocks (commonly blocks of 2 x 2 pixels), finds the pixel with the greatest (or max) value in that block, and uses just that max value in a new, compressed version of the image.

An 8 x 6 set of pixels with one square set of four pixels highlighted with the values 1, 9, 3 and 6. This set of pixels corresponds to a single pixel in a smaller, 4 x 3 set of pixels on the right hand side of the picture, which takes the value 9.

The size of the blocks used is referred to as the kernel size just as in convolutional layers, and the number of pixels moved per output pixel is known as the stride as before. By default the stride is the same as the kernel size so that just one pixel value per block (the max) is translated to the new output image.

For example, lets say an input image of 28 x 28 pixels has passed through a couple of convolutional layers so we now have a 28 x 28 image with 64 channels of information (64 x 28 x 28). If we then pass it through a max pooling layer with kernel (and stride) of 2, just one pixel in every block of four pixels will be output, so the output from the max pooling layer will be (64 x 14 x 14). In this way, the amount of spatial information is reduced, while retaining the same number of channels.

An important thing to note with max pooling layers is that unlike convolutional layers they have no weights and are not changed during the training process. The kernel size must be determined prior to training like any other hyperparameter.

We’ll show you how to make max pooling layers in PyTorch later in this week’s course.

Images (c) The University of Nottingham

This article is from the free online

Deep Learning for Bioscientists

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now