What’s coming this week?
Throughout this week, we will describe a set of technologies that allow datasets to be published over the web, and queried effectively by applications.
Compared with search engines such as Google and Yahoo, which are based on text-string matching, these technologies are ‘semantic’. This means that information is represented not in a natural language like English or Spanish, but in a graph-based data model that facilitates extension, integration, inference and uniform querying.
As a realistic application of semantic technologies, we will be using a portal through which learners can retrieve resources and information in the world of music. Consider for example the following tasks:
- Retrieve the Chinese orchestras that have interpreted a piece by Beethoven
- Retrieve a photograph of the conductor of this orchestra
- List male British rock musicians married to Scandinavians
Attempts to answer such queries through text-based search are unreliable: we might equally retrieve a performance in which the soloist was Chinese, or a rock musician that plays Scandinavian music.
Using semantic technologies, resources such as the audio file of the performance, or the photograph of the conductor, can be annotated using the Resource Description Framework (RDF).
In this framework, formal names can be assigned to what are called resources, which would include Beethoven, his violin concerto, the orchestra, and the conductor.
Names can also be assigned to types (or classes) of resource (composers, concertos, etc.), and to relationships (or properties) that link resources (e.g., the ‘composed-by’ relationship between composition and composer).
By reasoning over facts encoded in this way, a query system can confirm that a performance was given by the Beijing Symphony Orchestra, that this orchestra is based in Beijing, that Beijing is located in China, and so forth – thus combining geographical and musical knowledge in order to retrieve an answer.
In designing these semantic technologies, a key design decision was to leave open the naming of resources and properties, provided that names conform to the format for web resource names – that is, provided they are Uniform Resource Identifiers or URIs.
All four of the URIs below could be names for Beethoven, illustrating that the URI need not be human-readable (e.g., it might be an arbitrary string of letters and numbers), although identifiers should be resolvable to RDF representations that include human-readable labels, as explained later.
http://rdf.freebase.com/ns/en.ludwig_van_beethoven http://dbpedia.org/resource/Ludwig_van_Beethoven http://musicbrainz.org/artist/1f9df192-a621-4f54-8850-2c5373b7eac9#_ http://data.nytimes.com/N30866506154608358173
Note: we are aware the data.nytimes.com URI above does not currently work. We have left it there to serve the example and show an additional name for Beethoven.
If data from different sources are to be combined, it is therefore important to establish links, for instance through statements indicating that the above four URIs are synonymous. These statements, which can also be expressed in RDF, provide a means by which data published by many people or organisations can be combined into linked data.
We will be using the case scenario of a music portal to illustrate topics about describing resources in RDF, and querying these using SPARQL.
For examples of existing music portals, you can look at the BBC music reviews site and the Internet Archive’s Live Music Archive (also sometimes known as ‘tree’ site.
These applications make use of a music ontology and a large dataset of musical information called MusicBrainz, which we use in this course.
We will be using the MusicBrainz large dataset of musical information later on in the course.
Linked data results from a coming together of earlier ideas and technologies. These include hypertext, databases, ontologies, markup languages, the Internet, and the World Wide Web. We will begin this week by looking at the technologies that underpin linked data.
This work is a derivative of ‘Using Linked Data Effectively’ by The Open University (2014) and licensed under CC by 4.0 International Licence adapted and used by the University of Southampton. http://www.euclid-project.eu/