Want to keep learning?

This content is taken from the University of Southampton's online course, Introduction to Linked Data and the Semantic Web. Join the course to learn more.

Avoiding duplicates

For some queries you may find that the output table has duplicated rows. The reason for this is usually that the selected (tabulated) variables are a strict subset of the variables in the graph pattern.

Consider for instance the table below, which you will see if you submit the previous query and scroll down a little:

Table with duplicates Figure 2.4 Table with duplicates

In the centre of this figure we find two rows with track title ‘Within You Without You’ and duration 305000; and there are many more examples further down the table.

This happens because there might be multiple resources instantiating ?track having the same values for ?title and ?duration. Here, for example, the track ‘Within You Without You’ is present in two different albums, so it shows up twice.

If all three variables were tabulated, the rows would differ in the ?track column, but since this column is not requested, we obtain rows that appear identical. To avoid this you can include the keyword DISTINCT in the SELECT clause, as follows:

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?title ?duration
WHERE { dbpedia:The_Beatles foaf:made ?track .
        ?track a mo:Track .
        ?track dc:title ?title .
        ?track mo:duration ?duration .
        FILTER (?duration>300000 && ?duration<400000)
ORDER BY ?duration

Scrolling through the output table, you should now find only one row pairing ‘Within You Without You’ with 305000.

Since DISTINCT is computationally expensive there is an efficient alternative REDUCED which eliminates some duplicates but not necessarily all (e.g., it fails to eliminate the duplication of ‘Within You Without You’ mentioned above); however, DISTINCT is more widely used, and should not be computationally expensive when ordering is used.

This work is a derivative of ‘Using Linked Data Effectively’ by The Open University (2014) and licensed under CC by 4.0 International Licence adapted and used by the University of Southampton. http://www.euclid-project.eu/

Share this article:

This article is from the free online course:

Introduction to Linked Data and the Semantic Web

University of Southampton

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: