Want to keep learning?

This content is taken from the Raspberry Pi Foundation & National Centre for Computing Education's online course, Introduction to Databases and SQL. Join the course to learn more.

Databases and spreadsheets

Previously, you viewed and added records to your computer sales database. You may have noticed similarities between the database and a spreadsheet. You could store the data from computer sales in a spreadsheet. In this step, you will learn about the differences between a database and a spreadsheet.

Databases

The purpose of your computer sales database is to easily store and retrieve data about computer sales. In other words, a database holds organised data.

A database can also be used by more than one person. The most recent versions of spreadsheets can be used by more than one person at a time, but having multiple users accessing the spreadsheet can cause issues. When multiple users are working on an online spreadsheet, it is possible for the same cell to be updated by more than one person; this can lead to incorrect data being used. Databases provide a stable structure in which data is protected; when one person makes a change, it is available to all users.

Data integrity

Data integrity refers to the accuracy and consistency of data. When creating a database, you need to enforce data integrity whenever possible. For example, in your computer sales database, a name cannot be added to the integer fields; in this way, the data type contributes towards the data integrity of the database.

Databases should not have any duplicate data (data that appears more than once). For example, if the fields in the model table were included in the manufacturer table, the database might look like this:

An image of the computer sales records in one table with repeating data highlighted The computer sales records in one table with repeating data highlighted. Expand this image

  • Data associated with Raspberry Pi is repeated for all the Raspberry Pi models (shown in the red box). The data entered in the url fields are all https://raspberrypi.org, the year_founded fields are all 2008, and the trading fields are all 1
  • All the Apple models share the same data in the url, year_founded, and trading fields (shown in the green box)
  • All the Commodore models share the same data in the url, year_founded, and trading fields (shown in the purple box)

If you had to type all the data in, you would have to type the same data over and over, repeating the data.

To prevent duplicate data, these are removed and placed in a different table called manufacturer.

This makes the data easier to maintain and more accurate, because there is only one version of the data in one table. If a manufacturer changes their website address, for example, the data in the url field will need to be updated once, in the manufacturer table, rather than on the repeated records in the model table.

Spreadsheets

Spreadsheets display data as a grid or matrix of cells consisting of columns (A,B,C,D) and rows (1,2,3,4) rather than as records and fields.

Cells in spreadsheet can be based on formulae such as =A2*B2, which are automatically updated when the value within a reference cell changes.

An animated gif of the two cells being multiplied together

Spreadsheets are used for financial information, and are useful because the user does not need to manually update all the cells that depend on a change to the data.

Spreadsheets have relatively low limits on the amount of data that can be processed. For example, Excel can support a maximum of 1,048,576 rows within a worksheet, whereas many modern relational databases are limited only by the amount of data storage available.

Data representation

Spreadsheets can be used to present complex data in different formats, such as graphs or charts. This is very useful when you have to present computer sales information, as shown below:

An animated gif of a graph being created in a spreadsheet

Spreadsheets were created to interpret data and sort items into a list. They should be used to process figures and store items from a single list.

Choosing appropriate tools

Spreadsheets are generally appropriate for:

  • Calculations
  • Creating charts for presentations
  • Statistical comparisons

Databases are used for:

  • Multiple users
  • To select, filter, and sort data efficiently
  • Creating complex search queries with multiple conditions applied across multiple tables
  • Managing large amounts of data

The two most significant advantages of using a database over a spreadsheet are improved data integrity and the efficiency with which you can search the data.

Next step

In the next step, you’ll reinforce what you have learnt this week by taking a short quiz.

Share this article:

This article is from the free online course:

Introduction to Databases and SQL

Raspberry Pi Foundation