Skip main navigation

5-star linked open data

Berners-Lee proposed a star-based rating system for datasets. Watch Dr Barry Norton explain what's needed to make your data get the five-star rating.
BARRY NORTON: Once we have the four principles in mind for how to use those technology in the best possible way, we also have a star rating scheme to look at the data that people expose on the web, and say how well does it conform to those principles and best practice? 5 stars is where following the four principles aims to get you, but there are star ratings for people who are part of the way there, who are not necessarily using all of the linked data technologies or maybe you’re not using them all properly. So we get one star for just putting data on the web.
So if we have somehow a web page about the Beatles, that will give us at least one star because we’ve put something online about the Beatles, if it is resolvable from HTTP, which means if it’s on the web. We’ll get two stars for, at the technical level, the data being machine readable in some way. Now what you’ll find when people have legacy data is the crudest thing they can do if the legacy data’s on paper is to scan it all and put images online from scans or put those images into PDFs and publish their company handbook, their company product catalogue just as images within a PDF or as images on the web online. That would only get you one star.
To get two stars, it has to be machine readable.
In that case you need a reliable optical character recognition over so the data is machine accessible. Or better, the normal case, you put a database online. So your database, perhaps your relational database, has been transformed into HTML pages, and the data’s there, machine readable within the HTML. For that, you get yourself two stars. You get yourself three stars by using some non-proprietary formats for that data. So no PDFs, no Excel documents, but instead, something more open. So plain text is open as it can be. A CSV file instead of an Excel spreadsheet will get you three stars.
If you want to make it up to four stars, you need to identify the things that you’re talking about, using an open standard. Ideally, linked data best principles would tell you use HTTP URIs, but other identifiers are available, even within the URI scheme. So if you use stock book IDs, you consistently used IDs when you’re referring to things within your data, you’ve consistently used IDs, so you get yourself four stars. To get five stars, you have to link those IDs, link the things within your data to other people’s data as well. So the best way to get five stars is to follow the four linked data principles.
So four star says consistently use IDs, linked data best principles says consistently use HTTP URIs as those IDs. Five star says link to other people. Following on that argument, the linked data best principles say easiest way to do that you use HTTP URIs, other people use HTTP URIs, and you reuse other people’s URIs in your data. How do you do that? Easiest way, by using RDF. I relate two things that have URIs together under a particular relationship by asserting an RDF triple to relate the two things.
In 2010 Berners-Lee extended the note referenced earlier to propose a system for rating datasets, based on the five-star rating system used for hotels.
Closely related to the principles from the previous step, the 5 Star Linked Open Data system is as follows:
  • One-star (*): The data is available on the web with an open license.
  • Two-star (**): The data is structured and machine-readable.
  • Three-star (***): The data does not use a proprietary format.
  • Four-star (****): The data uses only open standards from W3C (RDF, SPARQL).
  • Five-star (*****): The data is linked to that of other data providers.
Note that every level here includes the previous levels: thus for instance three-star data must also be available on the web in machine-readable form.
Watch Dr Barry Norton illustrate the different types of data rating.
This work is a derivative of ‘Using Linked Data Effectively’ by The Open University (2014) and licensed under CC by 4.0 International Licence adapted and used by the University of Southampton.
This article is from the free online

Introduction to Linked Data and the Semantic Web

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education