Skip to 0 minutes and 8 seconds If you’re an organisation then or a startup, and you’re going to be using someone else’s open data, or even publishing your own open data for other people to use, there is a potential, I guess, for some kind of risk. So when you open data, you need see quite carefully decide what exactly you’re going to open. So you perhaps wouldn’t want to open the data on all of your employees, kind of how much they’re paid and where they live. So what other kind of key risks do organisations need to consider? You’ve obviously got things like breaking the law, right? Data protection has laws around it.
Skip to 0 minutes and 44 seconds And in Europe actually, they’ve just got a lot harsher such that you can be fined a percentage of your organisation’s value. At the other end of the scale, there’s kind of the business risks. What happens if we open this data and somebody somehow manages to replicate our business model overnight? You’re hoping that you’ve got the skilled employees inside your organisation that guarantee that you’re still on the edge, you’re still the best at providing it. But that might not potentially be the case.
Skip to 1 minute and 11 seconds So you might want to think about, what other data is there that we have that we can start exploring or maybe we’re already publishing stuff we can make open that is then of mutual benefit, so other people use it and then potentially provide benefit back to us and then there’s a growth around a relationship. So you can actually use open data to grow partnerships or relationships with your customers or other businesses, which is another key thing that I would say to consider for organisations before they publish data, is why? What’s the end goal? So for Duedil, is it a case of just bringing in and importing open data from other sources?
Skip to 1 minute and 44 seconds Or are you actively involved in publishing open data yourself through APIs and other sources? Yes. We’re active participants in the whole open data community and open data ecosystem. And so we’re big consumers of open data, but we also are big advocates of some of the initiatives that are going on within the open data community, such as LEIs, which are unique identifiers for companies that are sort of universally understandable. We also open up our product in the form of our APIs.
Skip to 2 minutes and 19 seconds So we are just in the process now of releasing an update to our API, which makes the information that we provide in Duedil much more accessible to third party developers so that they can take what we’ve developed and integrate it into other sources, and come up with new applications that might benefit people. In terms of publishing open data then, coming back to that, are there any kind of risks in publishing specific data sets? And how do you kind of consider sort of mitigating those? Yeah, so one of the risks that you face, even when publishing open data, is that by making something more visible, the subjects of that data may not like that, even though it is in fact, public.
Skip to 3 minutes and 4 seconds And we receive some queries, not as many as you’d think, but a few queries every month from people who weren’t aware that their information is available on something such as Companies House. But through a Google search or something like that, then they’ll find their information on a Duedil profile. And so there’s always the risk that somebody finds out about that, they don’t like it for whatever reason, and you hear about it. I think in those cases, we’ve tried to take a very pragmatic view. And our policy is that if an authority such as Companies House sees fit to redact information on a particular data subject for whatever reason, we will always follow suit, and with the authority.
Skip to 3 minutes and 55 seconds If it’s a case where the person just doesn’t like the fact that that information is out there, then we try and educate the person that this is public information, talk to them about why that information exists and is accessible to the public. And we’ll work with them to try and find a resolution. If we can, around making that data visible to users. So Rikesh, from a TFL point of view, what was the kind of key sort of tipping or moment when the organisation decided to stop publishing open data? Yeah, there are different types of data that we make available.
Skip to 4 minutes and 35 seconds And I think it was probably in 2007 where we recognised that people were scraping our data from our websites and developing apps. But quite often, there were errors with that information. So we started to make our data available, essentially the start of open data. And then over time, we developed a new website. And that website’s foundations were plugging in APIs and data sets. And what we did in turn is also make those APIs available. So Rikesh, once you started to get momentum and you started to publish more data in TFL, that must have been quite a complex process, because such a large organisation, you must have crazy amounts of different data sets.
Skip to 5 minutes and 17 seconds How do you decide on exactly which data sets to publish and in what order? I think you’re right. There’s thousands of different data sets in the organisation. And we started with timetables and schedules. It’s a statutory requirement. So rather than just making them available in PVFs, we started to make them available in much more usable formats, particularly ones that developers could use. That was our starting point. It was what I call static information. We’ve over time now started to progress much more into dynamic data. And that is, where is your next bus? When will the next bus arrive at your bus stop? What’s the level of congestion on London’s roads? Where are the incidents currently taking place?
Skip to 5 minutes and 59 seconds So you can preplan your journey. So starting between 2007 and up to 2010, the focus was very much static. In more recent years, it’s been dynamic. And it has been a structured process. So I guess Rikesh, sort of faced with almost an audit of all of the data that you have, how do you decide sort of which data sets to begin with? Do you start of with the bus network or the underground? How do you decide on the sequencing of what data to release and in what order? Initially we started very much with static data. And that was timetables and schedules.
Skip to 6 minutes and 38 seconds And more recently we’ve started to look at dynamic data, which tells you what’s the level of congestion or disruption of the network at a given time. I think the way we started is you’ve got to assess what value that data brings to the organisation by making it available. And it can be an expensive game. You could invest a lot in sensors and a lot in the technology. So you’ve got to really think about by making that data available, what value can you bring back to the organisation? It’s a cost-benefit analysis for every single data set?
Skip to 7 minutes and 14 seconds So Rikesh, once you’ve decided or identified open data that you want to publish, do you have to have sort of consultations or meetings to discuss the legal implications or risks associated with that publication? Yes. Within transport for London, we have a very strong governance process around open data. We have a transparency board, a transparency group that gets together regularly, that involves a representation from our legal team. So before making any data available, that there’s a thorough due diligent process.
Case studies: Publishing Open Data
We’ve heard from our experts on how they began to use inbound Open Data, but publishing data can also bring benefits to their organisations, the wider economy and society. Let’s hear some stories from our experts about how they came to publish Open Data, the issues they had to consider, what sources they publish and how it helps developers…
© Royal Holloway, University of London ¦ Attribution CC BY