£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

# What is lossy compression?

Lossy compression algorithms reduce the number of bits necessary to store a file by removing unnecessary or less important data

Besides lossless compression, the other type of data compression is lossy compression. Lossy compression algorithms reduce the number of bits necessary to store a file by removing unnecessary or less important data.

## Activity: compress an emoji

Here we use an emoji as an example to see how we can reduce its file size using this type of compression.

The emoji is a 10×10 pixel image:

bbbbyybbbb
bbyyyyyybb
byyyyyyyyb
byybyybyyb
yyyyyyyyyy
yybyyyybyy
byybbbbyyb
byyyyyyyyb
bbyyyyyybb
bbbbyybbbb

One method of reducing the size of this file is to look at the pixels in 2×2 blocks, work out which colour dominates within each block, and assign that colour to the block:

bb bb yy bb bb
bb yy yy yy bb

by yy yy yy yb
by yb yy by yb

yy yy yy yy yy
yy by yy yb yy

by yb bb by yb
by yy yy yy yb

bb yy yy yy bb
bb bb yy bb bb

Starting from the top left-hand corner, the first 2×2 blocks like this:

bb
bb

Black dominates here, so we can call this block B.

The next 2×2 block is:

bb
yy

Here, neither black nor yellow dominates this block, so we will pick the mid-point between the RGB values of the two colours.
Black is 0, 0, 0, and yellow is 255, 255, 0, so the mid-point is 127, 127, 0. We will call this H.

We do this for all the 2×2 blocks in the 10×10 pixel emoji:

bb bb yy bb bb
bb yy yy yy bb

by yy yy yy yb
by yb yy by yb

yy yy yy yy yy
yy by yy yb yy

by yb bb by yb
by yy yy yy yb

bb yy yy yy bb
bb bb yy bb bb

So the compressed file looks like this:

BHYHB
HYYYH
YYYYY
HYHYH
BHYHB

## Image quality

This file contains 25 characters compared to 100 in the original emoji — a 75% reduction in image file size. But what about the quality of the image?

Translating the compressed file back into an uncompressed image gives us this:

bbhhyyhhbb
bbhhyyhhbb
hhyyyyyyhh
hhyyyyyyhh
yyyyyyyyyy
yyyyyyyyyy
hhyyhhyyhh
hhyyhhyyhh
bbhhyyhhbb
bbhhyyhhbb

We can now load this newly compressed emoji and compare it to the original:

As you can see, in this example the lossy compression has led to a serious reduction in image quality.

## Activity: JPEG compression algorithm

One real-life compression algorithm is JPEG compression, which works on image files.

The JPEG compression algorithm is a little more complicated than the above example, and as a result it only causes a minor reduction in image quality.

Let’s try it out on the following image of a puppy:

• You can find the few lines of code you need for compressing an image using the JPEG algorithm either in this repl.it project, or copy and paste the code below into a Python file.
from PIL import Image
im = Image.open('puppy.bmp')
im.save('puppy.jpg',"JPEG", quality=90)

• If you’ve created a new Python file, make sure you have saved the image of the puppy as puppy.bmp in the same directory as your Python file.
• Compare the file sizes of the original puppy.bmp image and the compressed puppy.jpg image. The orginial should be about 2MB, while the compressed file should be around 140KB.
• Open both image files and compare what they look like. The compressed image should have little discernible loss in quality.
• When you look at the Python script, you can see that in the line of code that saves the file, there is an option for image quality.
• Reduce the quality value, run the script again, and see what effect the change you made has on the size and quality of the output file.

## The risks of lossy compression

Lossy compression algorithms, such as the JPEG algorithm and the MP3 algorithm, can reduce the size of files, which is a crucial factor when files need to be transferred from one computer to another, such as when you view an image in a web browser or watch a film on Netflix.

However, it is important to remember that this type of compression is a destructive process that causes data to be lost. Performing repeated rounds of compressions on a file can cause such severe loss of data that the file output becomes unrecognisable.