Want to keep learning?

This content is taken from the Raspberry Pi Foundation & National Centre for Computing Education's online course, Representing Data with Images and Sound: Bringing Data to Life. Join the course to learn more.

Everyday compression

Digital images, videos, and articles are everywhere in our daily lives, and the speed at which they’re uploaded to social media or downloaded to your mobile phone continues to increase. Improved devices and network infrastructure are partially accountable for this speed, but software tools such as data compression algorithms, which reduce the size of data files, play a vital role as well.

What is compression?

Compression is the process of representing data with fewer bits. Imagine being in a situation where your mobile phone is running out of storage space. At this point, if increasing storage capacity is not an option, you could free up space by deleting old photos. But instead of saying goodbye to some of your beautiful photos, you could also compress each photo file, reducing file size by removing unnecessary and repeating data.

There are many different compression algorithms, for many different file types. Depending on the algorithm, the output of the compressed file can be slightly different compared to the output of the corresponding uncompressed file. However, the compressed file still functions as it should: humans normally cannot tell the difference between the outputs.

What kind of data can be compressed?

Many types of files can be compressed — these are some of the most common:

Data Example Uncompressed file type(s) Compressed file type(s)
Images Photographs taken by a digital camera are compressed to save storage space RAW JPG, PNG, TIFF
Audio A Justin Bieber song is compressed for faster streaming on Spotify WAV, AIFF MP3, ACC, FLAC
Video The video file of the latest episode of House of Cards is compressed for faster streaming on Netflix MXF MPEG, WMA, H.261

Text also compresses very well, because it often contains repeating sequences of characters; instead of each character being stored as a separate byte, words or phrases can be stored together. You will see how this works later on in this section.

Many different files or directories can be compressed and stored together within a ZIP file.

The benefits of compression

  • A compressed photo requires fewer bits than its uncompressed counterpart, so it is transmitted faster, and your hardware can process it more quickly; ultimately, the photo loads faster in your browser.
  • An audio and video file can be compressed by up to 90%, so you can stream it all over the world within seconds.
  • Compressed images, videos, and audio files on mobile devices are transferred to cloud servers faster, which saves you time when you back up your devices.

Clever compression algorithms

Active compression

Some apps and web browsers actively compress images, videos, and music files before up- or downloading them, thereby directly reducing the amount of data transmission needed. With reduced data transmission comes a smaller bill for your home WiFi or mobile phone!

Video compression

The video stream service Netflix uses AI to analyse every shot in a video file and compress it without losing image quality visible to the human eye.

The AI system, called the ‘Dynamic Optimiser’ improves the quality of video when users have a poor internet connection. To develop this system, Netflix asked users to rate hundreds of thousands of shots. Then the AI algorithm was trained with this survey data so it could learn to distinguish between high- and low-quality images.

This Netflix algorithm is a smart and somewhat advanced use of compression. Now we’ll put the microscope on compression and look into how it occurs at a binary level.

Text compression example

Every compression algorithm aims to reduce data file size by removing unnecessary parts or finding and efficiently encoding patterns.

As mentioned earlier, text compresses easily because it often has lots of repeating patterns. Imagine a text file containing the following text:

I am Sam, Sam I am. That Sam-I-am! That Sam-I-am! I do not like that Sam-I-am! Do you like green eggs and ham? I do not like them, Sam-I-am. I do not like green eggs and ham.

As you’ve learned on this course, 8-bit ASCII encoding stores each character, symbol, or space in a single byte. Therefore, the text above would be stored in a file with 174 bytes. But by compressing the text, we can reduce the size of this file.

Creating a new compression system

The uncompressed text file used 1 byte to store each character. But as you can see, the text contains repeating characters, words, and phrases:

  • am repeats 13 times
  • I do not like repeats 3 times
  • green eggs and h repeats 2 times

Our new compression system requires a new set of binary values. For each repeating character or phrase, we create a new binary equivalent, so a ‘data dictionary’ stores the words and phrases along with new 1-byte binary values:

Index Text Binary code Byte(s) used with compression system Byte(s) used for uncompressed text
1 am 00000001 1 2
2 I do not like 00000010 1 13
3 green eggs and h 00000011 1 16
15 . 00001111 1 1

Even though the dictionary takes up some space, it allows long repeated phrases to be stored in 1 byte each, reducing the storage space needed for the entire text.

Other compression systems

Established compression systems are even more effective at compressing files. For example, with Huffman encoding, a common text encoding technique, the storage space for the ‘I am Sam’ text could be reduced from 174 bytes to 92 bytes.

In the next step we’re going to look at another compression system: run-length encoding (RLE).

Share this article:

This article is from the free online course:

Representing Data with Images and Sound: Bringing Data to Life

Raspberry Pi Foundation