Skip to 0 minutes and 4 secondsBARRY NORTON: The scenario that we've chosen to go through this whole set of webinars is based on the music domain. So there, there's several parts of content that have a data component that we're going to show how we can bring together using semantic technologies and a linked database approach. So obviously, the first part is the musical content itself on the Web now, or on the internet in general there's a lot of music available by a legal means now, both in streaming form and download form. So both, for instance, downloading from iTunes, listening online to last.fm, to Spotify. Underlying this kind of content, there's a lot of data which we would term metadata because it disambiguates the musical content.
Skip to 0 minutes and 56 secondsSo we have information on the music itself, on the artist that produced it, by extension, on the labels and producers that were associated with particular releases. OK, there are data sets that pick this stuff up and already publish some of that as linked data. What you'll also find out on the Web, which we'll concentrate on later when we talk about mining, for instance, is content like review content where people go to a new single or a new album and give their reviews. Now, some of this exists somewhat within the linked data domain. So some of it is interlinked with the metadata. Some of it's not. Some of it's just web content at the moment, primarily web content.
Skip to 1 minute and 43 secondsAnd as I said, we'll discuss when it comes to mining how we can aggregate this stuff together. The fourth kind of content, just to mention, we might also have visual content associated with these things, so album covers, artist pictures, these kinds of things. If we want to build a music-based portal, and if we want to use linked data technologies to help us do that, OK, it's the linked data, it's the identifiers, it's the interlinkage that's going to help us pull this stuff together in a low cost way to make such a portal.
A linked data music portal
As a motivating scenario for this course we consider the provision of a music-based portal, and the challenges and benefits of using Linked Data in creating it.
In order to provide a useful portal, the developer in this scenario would like to bring together a number of disparate components of data-oriented content:
- Musical content
Content exists in the third-party commercial setting (links into download and streaming providers, e.g. Amazon/iTunes and Spotify/Last.fm), the license-free setting (e.g. the Live Music Archive ‘etree’), and the grey market setting (e.g. YouTube).
- Music and artist metadata
While the MusicBrainz dataset is the primary source, it is weak on biographical and genre information, for which alternative sources will be discussed (including DBpedia).
- Review content
Reviews exist that are already Linked-Data-oriented (e.g. BBC Music Reviews), that are semi-structured but unlinked (e.g. Pitchfork) and that are largely unstructured (the Web in general).
- Visual content
Photographic depictions, album covers and videos exist on the Web, but are loosely coupled in terms of semantic interlinking.
The portal developer will use common identifiers to bring together this disparate content, and furthermore to offer interesting mash-ups using the inter-linkage to further data from the Linking Open Data Cloud, e.g. geographical and biographical exploration, and the possibility to provide engaging visualisations over this.
Developers will also seek to improve the quality of the semantic interlinking of the content they aggregate and contribute back to the Linking Open Data Cloud.
In particular they will improve the linking of artists and works to visual content and to reviews, in the latter case crawling review content and publishing external annotations.
They will also seek to improve classification within the metadata, encoding genre information – at least with respect to the emphasis of their portal – and along the way demonstrating the use of the Google Refine technology.
Finally, prototypical examples from the portal will demonstrate the use of RDFa annotation of human-readable content, and demonstrate the link to emerging Web technologies that inherit from semantics, such as Google RichSnipets, Facebook OpenGraph and schema.org annotation.
Figure 2.1 Architecture of a music portal
You can watch a screencast that explains the MusicBrainz site to find out more about linked data music applications.
© This work is a derivative of ‘Using Linked Data Effectively’ by The Open University (2014) and licensed under CC by 4.0 International Licence adapted and used by the University of Southampton. http://www.euclid-project.eu/