Skip to 0 minutes and 3 secondsBARRY NORTON: Just a quick word about SELECT because it's the familiar form. It's the one that looks like SQL. And like SQL, we could say SELECT and have a subset of our variables or an asterisks for all of them. We can, if we want, say FROM. In SQL, you always have to say FROM which tables do you want to join. In SPARQL you don't have to because the default thing to do is just to put all your graphs, so all of the graphs that you get from different documents together in one big graph and query across the whole thing. So we have an optional FROM clause. We have a WHERE clause, which in SPARQL is where the graph pattern goes.

Skip to 0 minutes and 48 secondsAnd then finally, you have modifiers like, for instance, ORDER BY, just like in SQL. So a really quick example, if we want to find out about albums that were made by the Beatles, we have this identifier that we've seen throughout for the Beatles. So in the WHERE part of our query, we use that as the subject to a statement.

Skip to 1 minute and 15 secondsThe predicate made from foaf: -- that's one way to relate things that people have made, products, to them. So we said the Beatles made, but instead of specifying the object, we use a variable. We say, you tell me what the Beatles have made. But using that same variable, we say in a second triple, whatever you tell me must also have a relationship defined by this predicate, which is title. So it should have a title. And again, we use a variable to say you tell me what that title is. Then after a semicolon, we say this same thing that the Beatles made has to have tracks on it according to the music ontology.

Skip to 2 minutes and 1 secondSo the predicate is track, but we use a variable for the object. Again, you tell me. And then using a final triple after a full stop, we say that track, it also has to have a title. And then in the head, in the SELECT part, we say, well, actually we're not interested in the URIs for the album and the track, respectively, album and track. We're interested in the titles that are attached to them by these two triples. So please just give me back these literals. And you get back a table, one for each variable where you have literals in each one, the album name and a track on it.

Skip to 2 minutes and 45 secondsAnd album name and a track on it, according to this constraint, the album has to have been made by the Beatles and the album has to have the track on it according to that track relationship.

Introducing SPARQL: The standard query language for the Semantic Web

In order to query RDF data, a query language is required. Similar to SQL for databases, SPARQL allows us to carry out queries on RDF data.

We suggest that you read the content on this step first for an introduction to SPARQL, and then watch the video with Dr Barry Norton talking through the example to illustrate the purpose of SPARQL.

SPARQL

The SPARQL Protocol and RDF Query Language (a recursive acronym, since it contains itself) is a language for formulating queries over RDF data. It is the Semantic Web’s counterpart to SQL (Structure Query Language), which has been a standard language for querying relational databases since the 1980s.

SPARQL is a recent addition to the Semantic Web stack of languages, having been recommended as a W3C standard in 20081.

Since Weeks 2 and 3 of this course are dedicated to SPARQL, we limit ourselves here to an example that illustrates its purpose.

Comparing SPARQL with SQL, the key difference is that it is designed for retrieving information from sets of triples, rather than from data organised into relations (i.e., tables). Queries are therefore formulated using lists of RDF triples in which some URIs or literals are replaced by variables, as in the following:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX music-ont: <http://purl.org/ontology/mo/>

SELECT ?album_name ?track_title 
WHERE {
  dbpedia:The_Beatles foaf:made ?album .
  ?album dc:title ?album_name . 
  ?album music-ont:track ?track .
  ?track dc:title ?track_title }

Translated into English, the meaning of this query is as follows:

Retrieve a list of all album names AN and track titles TT in the data for which the following conditions hold:

1 There is an album A made by the Beatles.

2 Album A has the title AN.

3 There is a track T on album A.

4 Track T has the title TT.

Or more colloquially: retrieve the titles of all tracks on albums by the Beatles, along with the corresponding album titles. The response should be a list of pairs, each containing an album name and a track title.

This example shows the simplest kind of query, in which the WHERE statement is simply a list of triples (containing variables).

SPARQL also provides some more sophisticated constructs: these include FILTER, which allows conditions on the values of variables (e.g., that a number should be between 1990 and 2000); also OPTIONAL, which specifies data that should be retrieved if available, while allowing the query to succeed even when they are unavailable. For more information on these more complex constructs, see Weeks 2 and 3.

Practically, to pose a query to a dataset you need to use a program or website that serves as a SPARQL endpoint. For a list of endpoints see the W3C site at http://www.w3.org/wiki/SparqlEndpoints.

Typically, an endpoint interface provides text fields where you can type the URL of the dataset you wish to query, and the query itself (e.g., the SELECT query in the example above). On hitting the ‘Submit’ button, you obtain a dynamically generated webpage listing the values of the query variables in a table.

There are also libraries allowing you to incorporate SPARQL queries into your programs, such as the Java library Jena at http://jena.apache.org/.

We’ve now covered all the essential technologies and standards that provide the foundations to Linked Data and the Semantic Web.

In the next step, there is an exercise in which you will have an opportunity to formulate your first SPARQL queries. There will be more SPARQL practice examples in the subsequent weeks.


References

  1. E. Prud’hommeaux and A. Seaborne (2008) ‘SPARQL Query Language for RDF’. Published on-line at http://www.w3.org/TR/rdf-sparql-query/. 

Share this video:

This video is from the free online course:

Introduction to Linked Data and the Semantic Web

University of Southampton

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: