We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
3.3

# Aggregate data

This step introduces one more operation to use with SELECT queries, before we move on to CONSTRUCT queries.

If we want to report the total duration of an album, we need to aggregate the duration of each track on that album first.

Suppose that the dataset contains triples specifying the tracks on each album, and associating a duration in milliseconds with each track, represented by an integer literal. (In fact the MusicBrainz dataset associates records with albums, and tracks with records, which complicates the query slightly – see below.)

Fully listed this is a lot of data, and you might wish instead to report, for each album, the total duration obtained by summing the durations of the tracks.

This is an example of aggregate data, and it requires two operations: first, we must segment the variable bindings into groups, corresponding in this case to all bindings relating to a given album; second, for each group, we must submit the values of a specified variable to an aggegation function – in this case, sum the track durations.

For instance, suppose the dataset has just two albums matching the query, namely Revolver and Abbey Road; and suppose that for Revolver just three tracks are included, namely “Eleanor Rigby”, “I’m only sleeping”, and “Doctor Robert”, with durations respectively of 200000, 240000, and 160000 milliseconds. (These are just round numbers made up for the example.)

This will mean that for Revolver we have three bindings of the variables ?album, ?track, and ?track_duration; and if Abbey Road has four specified tracks, it will correspondingly have four bindings.

We now separate the bindings into groups – three for Revolver, four for Abbey Road – and within each group we want to sum the track durations, and return only a table that gives albums along with their total durations.

This will mean that the first row of the table will specify Revolver in the ?album column, and 600000 in the second column for which we need a new name – perhaps ?album_duration.

Here is a query that achieves this result. In addition it imposes a condition on the total duration of the album, reporting only albums with duration exceeding 3600000 milliseconds (i.e., one hour), selecting (and hence grouping by) album title rather than album.

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?album_title (SUM(?track_duration) AS ?album_duration)
WHERE { ?album mo:record ?record .
?album dc:title ?album_title .
?record mo:track ?track .
?track mo:duration ?track_duration .
}
GROUP BY ?album_title
HAVING (?album_duration > 3600000)


Note two new keywords here: AS introducing a variable name for the sum of durations (this becomes the heading of the second column of the output table); and HAVING introducing a filter over the group of variable bindings that has just been specified by the GROUP BY component.

## Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join:

• ##### Welcome to the course
video

Watch Dr Elena Simperl & Dr Barry Norton explain how this short course on linked data & the semantic web can help you use this technology in your work

• ##### Developing real world applications
video

Watch Dr Barry Norton describing some real world applications that have Linked Data as their underlying technology.

• ##### Welcome to Week 2
video

Watch Dr Barry Norton explain what you will learn about SPARQL (the query language) on this course and what you will be practicing.