Want to keep learning?

This content is taken from the Raspberry Pi Foundation & National Centre for Computing Education's online course, Introduction to Databases and SQL. Join the course to learn more.

Skip to 0 minutes and 2 seconds So far, you’ve learned how to select and filter data from single tables in a database using SELECT, FROM, and WHERE statements. But what if the data you need is split across multiple tables? To solve this problem, you can use relationships across tables to join them together. This is SQL statement joins the tables together where the model ID field from the model table is equal to the model ID field in the sales table. Let’s execute the statement and look what happens from the output.

Skip to 0 minutes and 45 seconds So it returns three records. Let’s have a look in the sales table.

Skip to 0 minutes and 53 seconds And we have three records. So it’s worked. It’s looked at the model ID and matched it to the model ID in the model table. It’s given us the model names. By breaking down this query, you can see how the data is joined together. Assume the model and sales table hold the following data. The first entry in the model and sales tables are compared. Since the model IDs match, the data is returned. Then the model ID in the second row of the sales table is compared to the first row of the model table. As the model IDs don’t match, no data is returned. The first record in the model table is then compared to the remaining records in the sales table.

Skip to 1 minute and 55 seconds But no more results are returned. Next, the second record in the model table is then compared against all the records in the sales table. As the model ID matches for rows 2 and 3 in the sales table, the data is returned. The query continues to compare all the records in the model table until all the data has been processed.

Skip to 2 minutes and 30 seconds Time for another challenge. Change the query so additional information about the computer is also returned, such as number of calls or whether it has Wi-Fi. Add a WHERE clause to the SQL statement to only show sales made by a specific employee. Share your SQL statements in the comments below.

Selecting data in multiple tables

In the previous weeks of this course, you have learnt how to select and filter data from single tables using SELECT FROM WHERE statements.

What if the data you needed was split across multiple tables, for example if you wanted to see a list of all the computers that have been sold and when they were sold? The name of the computer is in the model table, but the sale_date is in the sales table.

To solve this problem, we can use the relationships across tables to join tables together.

Take a look at this SELECT SQL statement:

SELECT model.name, sales.sale_date
FROM model
INNER JOIN sales on model.model_id = sales.model_id;

There are two significant differences to SELECT statements you have used before:

  1. The addition of the line INNER JOIN sales on model.model_id = sales.model_id joins the sales table into the query and specifies that this should be done where the model_id field in the model table is the same as the model_id field in the sales table.
  2. The table name is included when specifying which fields should be returned by including a full stop between the table and field, for example model.name, sales.sale_date. This is good practice when selecting data from multiple tables and is essential if the same field name exists in multiple tables.

Execute the statement and have a look at the output:

db browser showing the result of the above query, 2 rows returned, 2 models of computer each with a sale date

You will see a row output for each record in the sales table, because every sale corresponds to a model. However, where there are no sales for a model of computer, the model is not returned.

By breaking down how this query runs, you can see how the data is joined together.

  1. Assume the model and sales tables hold the following data:

    model

    model_id name
    1 Raspberry Pi Model A
    2 Raspberry Pi Model B
    3 Apple MacBook Air
    4 Apple iMac
    5 Commodore 64

    sales

    sale_id model_id sale_date
    1 1 2020-01-01
    2 2 2020-01-02
    3 2 2020-01-03
    4 3 2020-01-04
    5 5 2020-01-05
  2. The model_id of the first row in the model table is compared against the model_id of the first row in the sales table.

    A still from the video showing the `model` and `sales` tables with the first record of each higlighted.

  3. As the model_ids are the same, the model.name and sales.sales_date fields are returned.

    model.name sales.sales_data
    Raspberry Pi Model A 2020-01-01

    A still from the video showing the `model` and `sales` tables, along with an unnamed table in the middle. The first record of each of the `model` and `sales` tables are highlighted in green. The first record in the middle table has been filled in with the model_name `Raspberry Pi Model A` and sales.sale_date `2020-01-01` - the former has come from the record highlighted on the `model` table and the latter from the record highlighted on the `sales` table.

  4. The model_id in the second row of the sales table is then compared against the first row of the model table.

    A still from the animation as previously, but now with the first record of the `model` table and the second record of the `sales` table higlighted

    As the model_ids do not match, the data is not returned.

  5. The first record in the model table is then compared to the remaining records in the sales table.

  6. The second record in the model table is then compared to all the records in the sales table.

    A still from the animation which shows the model_id in the second record of the `model` table being compared to the model_id in the first record of the `sales` table. These do not match.

    As the model_id matches for rows 2 and 3 in the sales table, the data is returned.

    A still from the animation which shows the model_id in the second record of the `model` table being compared to the model_id in the third record of the `sales` table. These match, and the corresponding model_name and sales_date have been added to the middle table (as the third record)

    model.name sales.sales_data
    Raspberry Pi Model A 2020-01-01
    Raspberry Pi Model B 2020-01-02
    Raspberry Pi Model B 2020-01-03
  7. The query continues to compare all the records in the model until all the data has been processed and the following data returned:

    model.name sales.sales_data
    Raspberry Pi Model A 2020-01-01
    Raspberry Pi Model B 2020-01-02
    Raspberry Pi Model B 2020-01-03
    Apple iMac 2020-01-04
    Apple MacBook Air 2020-01-05
    Apple MacBook Air 2020-01-06

    An animation showing all 7 steps of the join above.

Challenges

  1. Change the query so additional information about the computer, such as the number of cores, or whether it has WiFi, is also returned.
  2. Add a WHERE clause to the SQL statement to show only sales made by a specific employee.

Share your SQL statements in the comments section.

Share this video:

This video is from the free online course:

Introduction to Databases and SQL

Raspberry Pi Foundation