New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £29.99 £19.99. New subscribers only. T&Cs apply

# Grouping data

How SQL can be used to summarise and group data.
2.4
Let’s explore how you can use SQL to perform some analytics on a computer sales database. If you want to see the total number of sales, we use the count function, like in this SQL statement. So it can account all the records from the sales table. Let’s execute this. So it returns the total of 7,069. So that’s the 7,069 sales have taken place. But what if you want to put a date range in? Let’s put the WHERE clause in. So we’ve got WHERE sales date is after the 1 of January, 2019, and before the 31 of January 2019. Let’s execute this. So the sales have gone down because it’s only one month, to 514.
58.7
You could also find the total value of sales using the SUM function. So here we’ve used the SUM function, and we’re asking it for the sum of price from the sales table. Let’s run this.
75.5
So the total number of price is 251,835. Or the average income per sale. Let’s have a look at that. So we use the AVG function of price from the sales table.
97.7
Let’s execute this. So the average income for each sale is in the middle of around 35 pounds. You can also specify how data should be grouped together, such as sales grouped by employee. So in this statement, we’re counting. So we’re using the count function from the sales table, but we’re going to group it by the employee ID.
127.5
OK, so that’s given us the total sales for each employee, but we don’t know what the employee ID is. We haven’t asked for a return of the employee ID. In this statement, we’ve asked for the employee ID to be returned, and also order in descending order, the total sales. So if we run this one, we can see that hpatel has a total sales of 2,405, the most. And at the bottom with 1,282 is mhogan. How could you use the aggregate function, SUM and GROUP BY, to create a league table of the computers which generate the most income from sales? Share your thoughts in the comments below.

In the previous step, you learnt about data analytics; now you are going to use SQL to perform some, specifically analysing data of computer sales.

Download and open this version of the computer sales database, which contains the 2019 sales records for a fictitious company that sells Raspberry Pi computers.

To recap, the sales table holds data about each sale made by the company. There is a record for each sale, containing the model_id of the computer sold, the date it was sold, the employee_id of the person who made the sale, and the price it sold for.

Browse the data in the sales table and you will see that there are sales record for every day between 2019-01-01 and 2019-12-31.

### Count, sum, and average

The SQL COUNT function can be used in a SELECT statement to return the number of records contained in a query.

Run the following query to return the total number of sales:

SELECT COUNT(*)FROM sales

This SQL query returns the total number of computer sales in 2019.

Currently, the field returned is named COUNT(*). You can change the field name using the AS statement:

SELECT COUNT(*) AS total_salesFROM SALES

As with any other SELECT query, a WHERE clause can used to filter the results. By filtering the results between two dates, you can modify the query to show total sales for January 2019.

SELECT COUNT(*) as january_salesFROM salesWHERE sale_date >= '2019-01-01' AND sale_date <= '2019-01-31'

Two other useful SQL functions for analysing data in databases are SUM and AVG (average).

SUM could be used to find the total value of sales by adding up the price of all sales:

SELECT SUM(price) as total_valueFROM sales

Conversely, AVG can be used to return the average income per sale:

SELECT AVG(price) as average_incomeFROM sales

### Grouping

COUNT, SUM, and AVG are examples of aggregate functions. Aggregate functions bring data together and provide a summary. In the examples above, the data within the sales data is aggregated to provide a single count, sum, or average.

You can specify how data should be aggregated, or grouped together, by using a GROUP BY statement.

The SQL statement below returns total sales, but includes a GROUP BY employee_id statement.

SELECT COUNT(*) AS total_salesFROM salesGROUP BY employee_id

What do you think the four rows that are returned represent?

total_sales
1 2405
2 1506
3 1282
4 1876

The four records returned are the total sales made by each employee.

The GROUP BY employee_id statement specified that the COUNT of sales should be aggregated, or split, by the employee_id. As there are four different instances of employee_id in the sales table, four counts were returned.

To create a league table of sales by employee, modify the query to add the employee_id to the SELECT statement and order the results by the total_sales:

SELECT COUNT(*) AS total_sales, employee_idFROM salesGROUP BY employee_idORDER BY total_sales DESC

How could you use the aggregate functions SUM and GROUP BY to create a league table of the computers that generated the most income from sales? Discuss your thoughts in the comments section.

The SQLite documentation contains information about other available functions that you can use to manipulate data, for example by formatting dates or performing mathematical functions.