Skip to 0 minutes and 4 secondsHi, Suzy. Hi, Tobias. So how did you enjoy this week? It was great. I mean, I think this is a really important topic, so it's excellent to see the learners engage quite so much. Hi, Chanuki. Another wonderful week. Do you have questions for us? Yes, I do have some questions, so are you ready to get started? Yeah, let's go. Sounds wonderful. Great. So why should we use data from smartphones or apps to estimate crowd sizes when we have other sources such as tickets scans? That's a very good question. I can see that you might look at the football match example we were talking about and say, well, what's the point of that?

Skip to 0 minutes and 43 secondsWe already knew how many people were at the football match. But I think what we were really excited about with that study was the fact that we'd found an example where we did have this data on how many people were in a confined location, and so we can use that data to try and train models to understand the relationship between mobile phone activity and how many people are in a particular location. So yeah, it's true.

Skip to 1 minute and 12 secondsIn the few cases where we do have an excellent source of data on how many people are there, such as ticket scans, then the mobile phone data might not have quite the same value, but you can think of so many cases where we don't have that data, we don't otherwise know how many people are there. And that's where this mobile phone data really opens up some new opportunities. So why would tracking people through an app be better than just using cell phone data, for example? That is another good point. So we were talking about apps and we were also talking about cell phone data.

Skip to 1 minute and 48 secondsNow, the big advantage of the cell phone data or mobile phone data is that obviously so many people have mobile phones these days, and the data we're talking about tends to be generated without people needing to do anything. It's just the result of their mobile phones interacting with the mobile phone network. Now, the one advantage you could get, however, if people have installed an app for the event that you're interested in, is the fact that then you can actually get GPS information from the app. So mobile phone cells can be quite big.

Skip to 2 minutes and 29 secondsAnd so if you've got very precise GPS information about where people are, then that might actually give you slightly better granularity, so more detail, on exactly where people are. And the other interesting opportunity you have with apps is that say you've got an event like a festival, for example, and the app that you're talking about is offering people information on where they can get food, where they can get drinks, when they can go and see particular acts. Then you can also tell from your app usage data when people have been looking for that information.

Skip to 3 minutes and 4 secondsAnd this is relevant to questions about mobility because it might help you better predict where people are going to go, as well as measuring where they are at the moment. So sometimes there are these really large events and there's an overload on a local cell tower. So does that impact on the estimates of crowd sizes? So you think you actually have a smaller crowd size than there is in reality? You have a very good set of questions today. That's really fantastic, and that's a really good one. So, Suzy just mentioned that basically it's an advantage that all of us, basically, now have a smartphone or at least something which connects in one or another way to the cellular network.

Skip to 3 minutes and 44 secondsAnd sometimes this can become a problem as you just mentioned. If there are too many people in a too limited space, then basically this is going over the capacity of this cell phone tower. And then, basically, the short answer is we have a problem. So what can we do about that? Obviously what you would see is the build-up of people actually arriving in this restricted space. So for example, if you look in detail at these maps which Federico Botta produced for the stadium in Milan, then you will see that basically there is this build-up - people arriving in front of the stadium. And basically, over time, more and more people are crowding around that space.

Skip to 4 minutes and 27 secondsAnd so that's something you will definitely see. At some point, obviously, you would lose track in terms of precise numbers of people being connected just due to the problem that you have only a limited capacity. And from that point of view, on top of seeing this lead-up phase, it's important to look into complimentary data streams and data sets. And so from that point of view it's very important to basically find complimentary signals and other social media platforms.

Skip to 4 minutes and 57 secondsBut obviously, also these platforms are not reachable if you have no network, so then, actually, solutions like what Suzy mentioned, so apps coming into this open up the possibility to basically over time record where these phones are, so where we are moving about. And this doesn't necessarily mean that we need to be connected to the network all the time. I mean, this can be shared with the network later on. And so, basically, the summary is, it's a problem, but we can find complimentary signals in other types of online activity and other ways to measure it. So does there need to be a known event to estimate crowd size?

Skip to 5 minutes and 42 secondsI mean, could it also work if people were gathering for an unknown reason? For example, a secretly planned protest? Yeah, basically that's the crucial question. So we were quite lucky because we identified this place in Milan, the stadium, where we know afterwards how many people went based on the number of tickets sold. And so this was for us the calibration example. So from that point of view, calibration example would be always a very good idea because in other countries, in other scenarios involving, maybe, other subsets of society, the fraction of people carrying smartphones around might be different compared to what we have measured.

Skip to 6 minutes and 30 secondsHowever, you would still see the actual number of mobile phone connections, for example, open, maintained, and closed to the internet, using a certain cell tower. And so from that point of view, you know that at least a number of people is gathering there. You don't precisely know to how many people in the real world this translates, but you can see, actually, the crowding. So different people attend different types of events. So if you build a model for one type of event, how do you know it's going to work for another event? So, I mean, do you have to have a family of different models? For example, the sports events crowd model or the student protests crowd model?

Skip to 7 minutes and 11 secondsThat is another excellent point. So I think it goes without doubt the more data you have to train a model, the better the model is going to be. And so certainly, if there's a specific kind of event that you're interested in, if there are a few examples where it is possible to get another measurement of how many people are there so that you can use that to calibrate your model, then that is likely to improve the quality of your estimates in the future. However, what we were really excited about when we looked at this data was exactly how strong the relationship was between the mobile phone data and the ticket data we had, for example, in the football stadiums.

Skip to 7 minutes and 58 secondsWe saw that if we trained only on nine matches, then we could make excellent estimates of the number of people at a tenth match. And so I think that does underline what potential there is in this data, even if we have only a small number of cases on which we can actually calibrate the model. So will it become increasingly difficult to find out people's location? So people are becoming a lot more savvy about all this information they're sharing, so privacy issues come to the forefront. That's a very good one. So there are two answers to that, Chanuki, I think, and basically it is in our hand. We can decide what we do, with whom we share data.

Skip to 8 minutes and 42 secondsThe question is, are we happy to pay with this data, basically, right? As soon as we sign up for a service online, if we use an email system and it doesn't cost us anything, then basically we are paying with data. If we are prepared to pay with our locations, then it might not be very complicated to get hold of this information because people might be happy to share it. However, obviously, there is a downside. I mean, there are all sorts of problems which might emerge.

Skip to 9 minutes and 12 secondsI mean, we don't necessarily want to let the world know where we are and where we are going about, not only because of maybe revealing that we are not at home at the moment, for example. And the second answer to this is basically looking at some historical examples where we can anecdotally very interestingly see that this happened already. So if we look back at the early studies where people used geolocated Tweets and wrote papers about that, and these papers got media attention.

Skip to 9 minutes and 44 secondsAnd based on this, or at least it coincided, I mean, obviously, we can't claim causation, but coinciding with this publication and this first outcry in the media, basically the number of geolocalised Tweets, which share or contain GPS information, this fraction went down over time. So people became increasingly aware of this. And this all together basically is the solution, right? I mean, first of all to create the awareness that everybody can make an informed decision whether to share it or not, and then basically maybe to use this information really actively, basically, as a kind of currency. So it's the last week next week. It is. We're really sad about that.

Skip to 10 minutes and 28 secondsIt's been so much fun running this course and hearing what the learners have to say. But we really hope that next week will be a great opportunity for the learners to reflect on what they've learned over the period of the course, and to exchange their thoughts and the ways that their opinions have changed with the other learners. Still one week to come, enjoy. And all the best. Great. Looking forward to it.

Week 8 round-up

In Week 8, we began to explore how big data might help reduce the impact of disasters, and how new data sources can help us understand how people move around.

Here’s a brief summary to help you prepare for the final week of the course.

You heard from Federico Botta about a paper we published with him last year on determining crowd sizes using data from mobile phones and Twitter. Mirco Musolesi also talked to us about using smartphone data to understand people’s mobility patterns. You also heard how delivery of aid in Haiti following an earthquake was supported by rapid crowdsourced updates to OpenStreetMap, and considered other examples of how new data sources might help avoid and mitigate disasters.

By the end of last week, you’d also used R to analyse Google Trends data and calculate the Future Orientation Index for 45 different countries around the world. Keep up the great work!

We hope you enjoy your final week!

Share this video:

This video is from the free online course:

Big Data: Measuring and Predicting Human Behaviour

The University of Warwick

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: