Skip to 0 minutes and 9 secondsI'm David Wallom. I'm an Associate Professor here in the University of Oxford's e-Research Centre, and lead two different research groups-- Energy and Environmental Informatics and Advanced e-Infrastructure and Cloud Computing. Open data itself is actually more a philosophy. It's this idea that data should be freely, easily accessible without any restrictions on its reuse, on its sharing, or basically anything that you want to do to it. Open data should be findable, accessible, interoperable, and reusable. And in some ways, the need to adhere to those principles is one of the things that many people say will actually mean that open data will have true value. One person's value is always the foundation of another person's knowledge.
Skip to 0 minutes and 59 secondsSo what you can do in many cases, is get the primary value out of a piece of information, so energy consumption, metering, billing. But what can you then do with that data as the next step? Can you actually think about how you connect together energy consumption with behaviour? And from that point of view, how different people actually class energy consumption? What do they count as important in their own behaviour within either the home, the business, or elsewhere. Whenever you have a piece of open data, there is undoubtedly a transformation that you're going to need to be able to do on it.
Skip to 1 minute and 35 secondsAs part of that, ensuring that you have a good understanding of the tools, the processes, that you're going to use on that data is incredibly important. And you should be utilising software management best practises to ensure that those tools are captured, be that putting them in a repository, be that making sure that they're actually coded in a manner that means that somebody else is going to be able to use them. Discovery is still a great problem. There are a number of well-known places-- the Open Data Institute, CEDA Archive for Natural Environment Research Council funded information. But actually, when starting to work on a broader scale, it becomes more difficult.
Skip to 2 minutes and 16 secondsThere is no standard around where data repositories are on an international scale. So from that point of view, an earlier project of ours that was looking actually at the impact of deforestation and whether it could be traced to end consumer products, there we actually had to dig into to try and find open data on an international scale. So can you actually say that when you buy a piece of pork from a supermarket in France, how much deforestation that pork has caused? So you track back. You have your pig. The pig is fed soy husk. That soy husk has come from somewhere. That soy husk mostly has come from a share from the United States, from Argentina, and Brazil.
Skip to 3 minutes and 1 secondSo you then track back through all of the international trade data, international statistics, ship movement information to say that well, that particular animal feed producer buys all their animal feed from this particular pork. This particular pork, we know receives soy and tonnage of 'x' per year. And you track all the way back through to actually a satellite image of an area of the Amazonian jungle where you can see that deforestation is happening. All of those bits of information were open to some extent. Some of them needed quite a lot of data massaging to become a meaningful source. But there's no going to a set of repositories and going over what's there.
Skip to 3 minutes and 47 secondsSo for me that was a really exciting use of information from both open source and where we had to in a couple of cases to effectively bridge gaps from closed source, so getting customs information. So having customs information free and easily available internationally would be incredibly useful. A great problem around the idea of data more generally, in particularly open data and research, is if you spend or invest an awful lot of time in creating an experiment, do you really want to actually open it out and let other people who haven't invested in it actually then have easy access?
Skip to 4 minutes and 26 secondsThere we've got now the idea that actually data can we publish, can be attributed, in exactly the same way that scientific publication is. And that comes down many ways to this idea of licencing data. I mentioned earlier about what makes open data. And from that point of view, the most important part is not, well, I shoved it up on website and then someone could go and look at it. It's I put it somewhere where someone could find it. It had the metadata with it, but it also had licencing information and attribution requirements. But the most important thing is actually when people go to start using that information, do they have the provenance?
Skip to 5 minutes and 5 secondsAre they able to say, actually, that this data is exactly the data that was delivered, that it hasn't been altered in any way?
A research scientist's perspective
Watch David Wallom, Associate Professor at Oxford e-Research Centre, explain the philosophy behind open data and how it can be applied to research, using the example of a project to identify if deforestation can be traced to end-user products.
Are you aware of any other projects which use open data? Share these in the discussion below.
© University of Reading and Institute for Environmental Analytics