## Want to keep learning?

This content is taken from the Raspberry Pi Foundation & National Centre for Computing Education's online course, Representing Data with Images and Sound: Bringing Data to Life. Join the course to learn more.
3.11

# Lossy compression

Besides lossless compression, the other type of data compression is lossy compression. Lossy compression algorithms reduce the number of bits necessary to store a file by removing unnecessary or less important data.

## Activity: compress an emoji

We will use the emoji from Week 2 as an example again to see how we can reduce its file size using this type of compression.

The emoji is a 10×10 pixel image:

bbbbyybbbb
bbyyyyyybb
byyyyyyyyb
byybyybyyb
yyyyyyyyyy
yybyyyybyy
byybbbbyyb
byyyyyyyyb
bbyyyyyybb
bbbbyybbbb


One method of reducing the size of this file is to look at the pixels in 2×2 blocks, work out which colour dominates within each block, and assign that colour to the block.

bb bb yy bb bb
bb yy yy yy bb

by yy yy yy yb
by yb yy by yb

yy yy yy yy yy
yy by yy yb yy

by yb bb by yb
by yy yy yy yb

bb yy yy yy bb
bb bb yy bb bb


Starting from the top left-hand corner, the first 2×2 blocks like this:

bb
bb


Black dominates here, so we can call this block B.

The next 2×2 block is:

bb
yy


Here neither black or yellow dominates this block, so we will pick the mid-point between the RGB values of the two colours. Black is 0, 0, 0, and yellow is 255, 255, 0, so the mid-point is 127, 127, 0. We will call this H.

We do this for all the 2×2 blocks in the 10×10 pixel emoji.

bb bb yy bb bb
bb yy yy yy bb

by yy yy yy yb
by yb yy by yb

yy yy yy yy yy
yy by yy yb yy

by yb bb by yb
by yy yy yy yb

bb yy yy yy bb
bb bb yy bb bb


So the compressed file looks like this:

BHYHB
HYYYH
YYYYY
HYHYH
BHYHB


This file contains 25 characters compared to 100 in the original emoji — a 75% reduction in image file size. But what about the quality of the image?

Translating the compressed file back into an uncompressed image gives us this:

bbhhyyhhbb
bbhhyyhhbb
hhyyyyyyhh
hhyyyyyyhh
yyyyyyyyyy
yyyyyyyyyy
hhyyhhyyhh
hhyyhhyyhh
bbhhyyhhbb
bbhhyyhhbb


Using the function from Week 2, we can load this newly compressed emoji and compare it to the original.

As you can see, in this example the lossy compression has led to a serious reduction in image quality.

## Activity: JPEG compression algorithm

One real-life compression algorithm is JPEG compression, which works on image files.

The JPEG compression algorithm is a little more complicated than the above example, and as a result it only causes a minor reduction in image quality.

Let’s try it out on the image of the puppy we’ve worked with before.

• You can find the few lines of code you need for compressing an image using the JPEG algorithm either in this repl.it project, or copy and paste the code below into a Python file.

from PIL import Image
im = Image.open('puppy.bmp')
im.save('puppy.jpg',"JPEG", quality=90)

• If you’ve created a new Python file, make sure you have saved the image of the puppy as puppy.bmp in the same directory as your Python file.

• Run the Python script. If you’re using repl.it, download the puppy.jpg file to your computer.

• Compare the file sizes of the original puppy.bmp image and the compressed puppy.jpg image. The orginial should be about 2MB, while the compressed file should be around 140KB.

• Open both image files and compare what they look like. The compressed image should have little discernible loss in quality.

• When you look at the Python script, you can see that in the line of code that saves the file, there is an option for image quality.

• Reduce the quality value, run the script again, and see what effect the change you made has on the size and quality of the output file.

## The risks of lossy compression

Lossy compression algorithms, such as the JPEG algorithm and the MP3 algorithm, can reduce the size of files, which is a crucial factor when files need to be transferred from one computer to another, such as when you view an image in a web browser or watch a film on Netflix. However, it is important to remember that this type of compression is a destructive process that causes data to be lost; performing repeated rounds of compressions on a file can cause such severe loss of data that the file output becomes unrecognisable.