## Want to keep learning?

This content is taken from the Coventry University's online course, Get ready for a Masters in Data Science and AI. Join the course to learn more.
1.25

# Data representation

We’ve taken the first few steps in Python programming: breaking down an algorithm into steps that the computer can understand. Now let’s look at how data is broken down and represented in the computer.

The Python code we write (and other applications that we use) are known as software. The physical computer is the hardware.

In computer hardware, the basic unit of storage is high voltage vs low voltage, which we then interpret as true/false in logic or 1/0 in binary. For example, a hard drive magnetises the surface of a spinning disk to represent 1/0 and a compact disc (CD) has lands (tiny piece of reflective surface) and pits (tiny indentations) which represent 1/0. So all data is ultimately reduced to strings of 1s and 0s.

## How is data stored?

In order to store any data in a computer we must be able to reduce it down to the strings of the building blocks of 1 and 0. This applies to all types digital data such as numbers, characters, audio and images. Each of these are encoded as 0s and 1s with appropriate interpretation. For example, the binary string 01001000 01101111 01110111 00111111 could encode one integer, a string of four characters, or a very tiny black-and-white image.

Before we look at how characters (including letters and punctuation) are encoded as binary, we need to look at how numbers are encoded and decoded. This is because a character is encoded as a number and a number is encoded as a binary string.

### Decimal number system

In the decimal number system there are 10 possible digits, namely {0,1,2,3,4,5,6,7,8,9}. This is the number system we are used to using in most of the mathematics we have learned. When we write 4583 in the decimal number system we use the place value system to decode its meaning. Starting from the rightmost digit, we see that there are 3 ones, 8 tens, 5 hundreds and 4 thousands. The place value of the columns are multiplied by 10 as we move from right to left.

$4583 = (4\times1000) + (5\times 100) + (8\times 10) + (3\times 1) = 4000+500+80+3$

### Binary number system

In the binary number system, the only possible binary digits (called bits for short) are {0,1}. When we write 01011001 in the binary number system we also use the place value system to decode its meaning. This time the place value of the columns are multiplied by 2 as we move from right to left. Starting from the rightmost digit, we see that there is 1 one, 0 twos, 0 fours, 1 eight, 1 sixteen, 0 thirty-twos, 1 sixty-four and 0 one-hundred-and-twenty-eight.

$01011001 = (0\times128) + (1\times64) + (0\times32) + (1\times16) + (1\times8) + (0\times4) + (0\times 2) + (1\times 1) = 64+16+8+1 = 89$

For a practical explanation, watch this video on How to convert binary to decimal (hosted on YouTube).

Most of the time, we don’t need to know exactly how numbers are encoded and decoded into binary strings within the computer. When using Python, we generally use and manipulate numbers as if they were in the decimal number system. However, sometimes it is useful to know how numbers are stored to explain what happens, eg if we add a very tiny number to a very large number, or try to compare numbers that only differ by a tiny amount.

### Extensions to the binary number system

Let’s explore how whole numbers are represented in the binary number system in more detail. In particular, how many bits will be needed to store larger numbers?

Using four bits, 0000 to 1111, we can represent 16 different whole numbers. Notice that $$2^4=16$$

 4-bit binary decoding decimal 0000 $$(0\times8)+(0\times4)+(0\times2)+(0\times1)$$ 0 0001 $$(0\times8)+(0\times4)+(0\times2)+(1\times1)$$ 1 0010 $$(0\times8)+(0\times4)+(1\times2)+(0\times1)$$ 2 0011 $$(0\times8)+(0\times4)+(1\times2)+(1\times1)$$ 3 0100 $$(0\times8)+(1\times4)+(0\times2)+(0\times1)$$ 4 0101 $$(0\times8)+(1\times4)+(0\times2)+(1\times1)$$ 5 0110 $$(0\times8)+(1\times4)+(1\times2)+(0\times1)$$ 6 0111 $$(0\times8)+(1\times4)+(1\times2)+(1\times1)$$ 7 1000 $$(1\times8)+(0\times4)+(0\times2)+(0\times1)$$ 8 1001 $$(1\times8)+(0\times4)+(0\times2)+(1\times1)$$ 9 1010 $$(1\times8)+(0\times4)+(1\times2)+(0\times1)$$ 10 1011 $$(1\times8)+(0\times4)+(1\times2)+(1\times1)$$ 11 1100 $$(1\times8)+(1\times4)+(0\times2)+(0\times1)$$ 12 1101 $$(1\times8)+(1\times4)+(0\times2)+(1\times1)$$ 13 1110 $$(1\times8)+(1\times4)+(1\times2)+(0\times1)$$ 14 1111 $$(1\times8)+(1\times4)+(1\times2)+(1\times1)$$ 15

Using eight bits, referred to as one byte, from 00000000 to 11111111, we can represent 256 whole numbers. Again notice that $$2^8=256$$

We’ve looked at the simplest case of encoding and decoding numbers. If you’re interested in finding out how to represent integers (positive and negative whole numbers), you can read about the two’s complement system, which makes it easy to add and subtract using negative numbers.

Even more complicated is the floating point system for representing numbers with a decimal point. The IEEE 754 standard provides an international standard for how this is done on computers. Sometimes you will hear of single precision 32-bit (4 byte) floating point numbers and double precision 64-bit (8 byte) floating point numbers. If you are interested, have a look at this example.

### Representing characters

We usually represent a text character (capital letters, lowercase letters, punctuation, digits and special symbols) as one byte (8 bits), so there are $$2^8=256$$ possible characters.

The UTF-8 standard is used to map the values 0-255 to individual characters. For example, upper case ‘A’ is 65, lower case ‘a’ is 97, space is 32, and ‘=’ is 61. Online binary converters (eg RapidTables are a useful tool for interpreting binary into characters, or vice versa.

We have seen that all data must be encoded into a string of bits (ones and zeros). We encode and decode by applying some interpretation to the place of each bit in the string. We have seen how this applies to whole numbers and to characters in particular.

When writing Python code, we generally have access to high-level functions and libraries that others have developed so we don’t need to worry about the details. In the same way, programming languages such as Python provide access to high-level data types such as integers, floating-point numbers and character strings so that we don’t have to perform any explicit encoding and decoding ourselves. Even so, it’s still useful to know.

Have a go at writing a two-digit decimal whole number in 8-bit binary and share with your fellow learners in the Comments area.

## References

WikiHow. (2018, June 19). How to convert a number from decimal to IEEE 754 floating point representation. https://www.wikihow.com/Convert-a-Number-from-Decimal-to-IEEE-754-Floating-Point-Representation

Rapid Tables. (n.d.). Text to binary.https://www.rapidtables.com/convert/number/ascii-to-binary.html (Retrieved July 24, 2020)

Tecmath. (2020, February 27). How to convert binary to decimal [Video]. YouTube.https://youtu.be/a2FpnU9Mm3E

Wikipedia. (2020, July 18). Two’s complement. https://en.wikipedia.org/wiki/Two%27s_complement

Wikipedia. (2020, July 22). UTF-8. https://en.wikipedia.org/wiki/UTF-8#Codepage_layout