Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 3 secondsHi, Suzy. Hi, Tobias. So what did you think about this week? I thought it was great to have a chance to actually talk about role the of big data in disasters because quite a lot has been said about the possibilities of using big data to understand rare events and possibly to mitigate the consequences of crises by better predicting how humans are going to react to them. So it's been really good to have a chance to address this on the course. And not only disasters. We also had a lot of questions, very good questions, about privacy. I mean obviously, this is one of the key topics in big data. OK. Great. As usual, I've got some questions for you.

Skip to 0 minutes and 42 secondsSo should we get going? Yeah. That sounds great. Let's go OK. So a lot of interesting questions came up with Federico's results. So the learners are wondering, basically, would these results hold for other types of crowds? For example, what about crowds in a theatre or what about a football match in the end of the season? I understand that the study was done in the middle of the season. And what about other countries? Oh, a lot of questions. Not only one question. But very good questions, of course. So what we have done, together with Federico, in Milano was a very well calibrated example.

Skip to 1 minute and 22 secondsSo we looked at areas of the city of Milano where we had precise information on how many people have been there. So, for example, recorded by how many tickets have been sold, how many people actually turned up coming to these matches, or for example, visiting for whatever reason at the airport. So we quantified it by looking at how many flights are going in and out in a certain time window. So our hope, of course, is that this provides evidence which can be applied in other circumstances, possibly in other countries. But we have to be a little bit careful with this because for good reasons these questions came up.

Skip to 2 minutes and 5 secondsHuman behaviour, as we have seen throughout the course, is something which is not really expected to be constant over time or constant against a lot of different possible changes and biases we see in the world. And so if we go from one country to the next maybe we find a slightly different situation. We might also find a slightly different situation if we wanted to look at a scenario or event where for this type of event we haven't trained our algorithms before.

Skip to 2 minutes and 36 secondsSo we expect that these results work best if we have examples where we can calibrate the data source, so basically a series of events like in our study where we have a football stadium that we can look at early matches and try to, for example, predict the attendance figures for the last match. So you pointed out that this was basically for a very specific time window, it was roughly an eight-week period towards the end of the year. So it's not the end of the football season, and that's also a good point. However, we have seen in other examples in this course that we can find adaptive algorithms which change or try to adapt if there are only tiny variations.

Skip to 3 minutes and 24 secondsSo for example, if we go from one match to the next match and maybe you can imagine if it goes up slowly, the degree of interaction, towards the end of the season, then this is something an algorithm would be able to capture, at least to some extent, unless there's a really sudden and dramatic change in the behaviour of how people interact. So that's something we can probably work on very well with. So the other problems or other possibilities we are around. Basically we've mentioned one of these. For example, places like theatres, where you maybe forced to switch off your phone or at least not encouraged to use your phone in other places.

Skip to 4 minutes and 5 secondsSo maybe something which is worth pointing out is that we didn't look at active interactions, so to say, only. So we used data on how often people tweet, for example, but we also used information on internet package transfer, so basically how often or how frequently your phone is talking to the internet. And this is actually happening without you needing to interact with your phone. For example, if you have your phone in your pocket in the background it is checking emails.

Skip to 4 minutes and 41 secondsAnd so we still see a signal or interaction with the mobile phone network and this provides us with a signal, and in our case in this Milano example, in the Stadium San Siro, we find that this is a very strong signal, so we have a squared value of 0.95. So a very, very strong relationship with the internet activity data and the number of people attending the stadium. So based on this, we think that this is a very good basis in order to try and look at other cases.

Skip to 5 minutes and 15 secondsBut obviously, the best case would be that we actually have also some additional examples of real world attendance figures in other places, maybe in other countries, in order to further refine our algorithm and to make it more robust against changes which might happen. OK. So a related question is why use data from smart phones to estimate crowd size when you can use all the CCTV footage out there? That's another good question. Yes, I mean CCTV data is also a very fascinating data source. We mainly get visual content, videos, which we might need to analyse, and maybe that's already part of the answer, part maybe of a twofold answer.

Skip to 6 minutes and 1 secondSo first, actually, it would be a little bit more complex, given the simplicity in terms of data we get from mobile phone usage and basically already geocoded activity. So this means in different areas, for example, of the city, we would easily get a signal. For CCTV cameras, obviously these cameras are located in different parts of the city, maybe different roads, entrances, exits of subway stations and so on and so forth, so you would be able to regionally assign cameras to places, of course, because they are installed there.

Skip to 6 minutes and 38 secondsBut in order to get a more quantitative idea about the flow of pedestrians, obviously you would need to apply some algorithms, some image recognition algorithms in order to count heads, to count people on the footage which you get streamed from all these different cameras. So there's a certain additional level of complexity, but it's definitely possible. The second answer to this might be - and I also may be coming back to your first question - so now we have regional or geographic differences because in the UK, you are right. There's a lot of cameras available recording what is going on, for example, in London.

Skip to 7 minutes and 23 secondsBut there are other places in the world where the density of CCTV cameras is much lower, if even there. So it's also a question of universality in order to maybe use the methods in different countries around the globe in order to get an idea about crowd sizes. OK. So as we become more savvy that we are getting tracked from our mobile phones, will it be increasingly difficult to find out people's location? I mean, what happens when people realise, well, actually, I've got valuable information and I don't know if I want to just share it for free. I think that's a really good question.

Skip to 8 minutes and 1 secondI think my hope, and perhaps our general hope in our research lab here, is that in the future, people might be able to make more informed decisions. So we see this often, that when we publish results of analyses that we've done of aggregate data, where you can't see individuals' behaviour, that this serves to make people realise that they're actually leaving this data behind as an individual. And so I think it's not very good to have people giving up this data without realising that they're doing that, but we do see many cases where people are willing to give up data in full realisation of the fact that this is what they're doing.

Skip to 8 minutes and 47 secondsAnd so I think a good example of this is perhaps George MacKerron's mappiness app that we saw last week. If you install that, it makes it very clear to you that you're going to be giving your happiness ratings that you type in to somebody else, and yet we see that people do still sign up to that app because they're excited about the fact that they can get data on their own happiness as a result. And I suppose another, more extreme example, is if you just consider Google, the search engine. So I think sometimes when people realise that Google is keeping track of everything that you're searching for, this can be quite alarming.

Skip to 9 minutes and 22 secondsBut most people, if you say to them, well, there's one way to avoid that, you could just not use Google, people often don't want to not use any search engine at all because the internet is opening up all of these possibilities for us to get better access to information. And so again, the reward that people are getting from that, often they feel is enough to compensate for the fact that Google knows what they're looking for.

Skip to 9 minutes and 47 secondsBut if you're really aware of the fact that that data is being tracked, then perhaps you're going to think a bit more carefully about what exactly you look for and what you type into Google, rather than essentially sharing all your inner thoughts and concerns without realising that a computer somewhere is keeping careful track of it. Thanks. Another important question that came up was, will it ever be possible for decision makers to rely on these big data dashboards without needing a statistician to ensure that these figures are interpreted correctly? I think that's a really important question and it's really good to see that the learners have brought this up.

Skip to 10 minutes and 25 secondsI think there's an issue you have working in big data in general. So often we talk, as we have done in this course, about how we can use big data to possibly better predict how people are going to behave in the future. And sometimes when you say that, people think it means that we're going to be able to say, tomorrow everyone's going to do exactly this. And as I know, the learners will have realised going through this course, that's not the case. We're not talking about making exact predictions. What we're trying to do is improve our understanding of the probabilities that certain things might happen in the future.

Skip to 11 minutes and 7 secondsI think communicating these probabilities and often communicating risk of a bad thing happening, given that we're talking about disasters, is a really important topic and it's something that, as researchers, we need to make more progress on. But we know it is impossible to try and convey probabilities in some circumstances. If you think of weather forecasts, for example, we know that they're not always right. We know that the weather forecast is telling us that this is likely to be the case that this is what's going to happen, but it's not it's not guaranteed. And so I suppose that is a case where we've seen a science has developed predictions.

Skip to 11 minutes and 45 secondsThey're pretty good predictions, but when they're delivered to the public, the public does have a good understanding of the fact that those aren't predictions which are guaranteed. They're a representation of the fact that this is probably what's going to happen, but not exactly what's going to happen. For people who are interested in this - how you communicate risk and also how well do we deal with notions of risk and notions of probability - there's an excellent TEDx Talk by Gerd Gigerenzer that you can look up on the TEDx site, where he talks about problems that we have here and maybe steps that we need to take to better educate people about basic ideas in probability so that not only policymakers, but the public, can deal with these concepts in a better fashion.

Skip to 12 minutes and 35 secondsGreat. Thank you. So it's the last week next week. Yes. Which is going to make us very sad. We've been really enjoying interacting with the learners and having people to talk to about something that we're really excited about. And so next week we're going to get back to some topics that we looked at, at the beginning of the course to try and give the learners a chance to see how their thinking has changed as a result of the material that they've been working through here. Absolutely. And by the end of next week you will also be able to calculate the future orientation index which we introduced at the very beginning of the course. Great. Looking forward to it.

Skip to 13 minutes and 16 secondsSee you next week. See you. Bye. See you then. Bye-bye. Bye.

Week 8 round-up

In Week 8, we began to explore how big data might help reduce the impact of disasters, and how new data sources can help us understand how people move around. Here’s a brief summary to help you prepare for the final week of the course.

You heard from Federico Botta about a paper we just published with him on determining crowd sizes using data from mobile phones and Twitter. Mirco Musolesi also talked to us about using smartphone data to understand people’s mobility patterns. You also heard how delivery of aid in Haiti following an earthquake was supported by rapid crowdsourced updates to OpenStreetMap, and considered other examples of how new data sources might help avoid and mitigate disasters.

By the end of last week, you’d also used R to analyse Google Trends data and calculate the Future Orientation Index for 45 different countries around the world. Keep up the great work!

As usual, you continued to ask many insightful and thought-provoking questions. One interesting question was whether policy-makers and the general public will ever be able to understand the results of a big data analysis without help from a statistics expert. It’s very true that conveying ideas about risk and probability can be a particular stumbling block in this respect, and so we hope you enjoy watching this TEDx talk by Gerd Gigerenzer. He believes that we need to change our education systems to ensure that society as a whole gets a lot better at grappling with these concepts.

We hope you enjoy your final week!

Share this video:

This video is from the free online course:

Big Data: Measuring and Predicting Human Behaviour

The University of Warwick

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join:

Contact FutureLearn for Support