Skip to 0 minutes and 4 secondsHi, Suzy. Hi, Tobias. So how did you enjoy this week? It was brilliant, I really love this topic, so it was fantastic to see the learners get involved too. Hi, Chanuki, another great week, with lots of activity. Really good to see. So great. As usual, I have some questions for you, so should we get going? Yep, let's go. Let's go. Great. So a couple weeks ago, we ran a game where the learners were rating the scenicness of Rio. And this week, we've been talking about measuring happiness. So both of these variables appear to be incredibly subjective. I mean, people discussed that they had different opinions of what's scenic. And that's very much the same for being happy.

Skip to 0 minutes and 44 secondsSo how can we cope with this variance between how people think about these different variables? So how can big data help us measure and understand these subjective variables? That is a very good question. So of course, it's very true. We know that people are going to have different opinions about how scenic a photograph is, or what being happy at a particular moment in time means. But despite the fact there's this difference between people, we can see that these measurements might also be important. So I think if you ask most people, does it matter to you how happy you are, many people would say yes. And of course, we know that we worry about other things, such as our health.

Skip to 1 minute and 31 secondsAnd so the reason we want to try and measure these quantities is to see if we can identify at scale things which might affect these subjective measurements, like our happiness. Or indeed, work out whether we can identify a relationship between some sorts of subjective measurements of the environment we live in, like scenicness, something you know an awful lot about, and other things that matter to us, like our health. So the huge opportunity we have with these new data sources is to actually measure these things at scale. So we're trying to measure people's opinions on how scenic an area is at scale, or measure people's opinions on how happy they are at scale.

Skip to 2 minutes and 18 secondsNow, there is going to be an awful lot of noise in that data because of the differences that people have in what these ratings really mean. But what your research has shown, and what other studies have shown that we've described this week, is that there are patterns that emerge if you do have enough measurements. So you start seeing, for example, relationships between the environment that we live in and how happy we are, or, indeed, the scenicness of the environment that we live in and how healthy people report themselves to be. So I think these are, without a doubt, difficult, challenging aspects of our world to measure, but there are many, many reasons for which we should be trying.

Skip to 3 minutes and 4 secondsAnd these new approaches to crowdsourcing measurements, or gaining new perspectives on these measurements, through things that people write online that might give us other indications of stuff like happiness. And that's a huge opportunity for us to finally grapple with these difficult, but important questions. So another related question I have is, how can this self-reported data be reliable? I mean, are people likely to report what other people want to hear? And how do you deal with this in your analysis? Yeah, that's a very good question, Chanuki. I mean, there are several points to make.

Skip to 3 minutes and 42 secondsBasically, if we look at the range of different topics we have covered so far, and also the range of data sources, then, I mean, possibly point number zero is it's always better to have data than to have no data. I mean, that's possibly something where everything starts, where we can actually start to work with something. Secondly, it would be, possibly, a good idea to compare this with other types of data, or other types of approaches and methods, which can produce similar data sets, right?

Skip to 4 minutes and 16 secondsI mean, if we look at traditional laboratory based experiments, and how these experiments are conducted, and who is actually invited to join these experiments and answer the questions, then obviously there are also certain aspects which are influencing the results, the data points. I mean, the usual simplification that mostly undergraduate students, of very special degrees, are taking part in these studies, or at least predominantly, might influence to a certain extent what the outcome might be. And so from all of what we have seen so far is that the advantage of these new forms of data, these new big data sources, are actually allowing us to study the behaviour of humans, or what humans are interested in, in a natural setting.

Skip to 5 minutes and 12 secondsWe are not always asking them to, basically, reply to a question, like we have seen in the mappiness app, for example. We are basically recording what they are doing as they go along by just doing their normal activities on a daily basis. And so the question is, what would be their advantage to basically change their self-reported data in a way which would influence our study, so to say, or the conclusions drawn up on, basically, outcomes of the analysis? And so we always need to ask the question, what's their incentive to, basically, for example, Google which place to visit, or restaurant to go to, and how this might be affected by a self-reporting bias?

Skip to 6 minutes and 1 secondPossibly, maybe, we have seen in the stock market week that obviously there would be, for a group of people, a certain advantage to generate a lot of Google searches in order to manipulate signals which we can download or retrieve from Google Trends. But at the same time, if we think about incentives and why people are doing certain things, then the mappiness example, is a very good one. Because we are asking people how happy they feel, and certain other types, what activities they do, and so on, and so forth, as we have seen.

Skip to 6 minutes and 31 secondsAnd basically their incentive is that they get a very detailed analysis, or overview, in which places, with which activities, they actually feel better than in other states and circumstances. And so if it is really helpful to them, then they are actually happy to commit time to do these activities. And so possibly the risk that there's a certain reporting bias is reduced by the fact that they are able to commit time out of their busy lives, and all of us, and basically just allows us to study humans in their natural settings. So obviously, I mean, we can't actually make any claims about an individual.

Skip to 7 minutes and 18 secondsI mean, obviously, maybe there's one or two people, or maybe x people, who actually use this in order to just submit random results. But we are always, in most of the cases, interested in collective phenomenon, and actually studying people across in them both, across space, across time. And so these differences will actually be flagged up, basically, during the analysis. And so we would be able to specifically also analyse which biases these particular data sources, or ways in which we have collected human behaviour by, for example, an app, would actually be influencing our analysis. So another issue that people were concerned with is that a lot of these studies are using either Facebook or Twitter or an app, like mappiness.

Skip to 8 minutes and 7 secondsSo isn't there an issue that we're coming up with these, like, claims, based on only a subset of the population? Yeah, I mean, that's another great question, I mean, to some extent, relating to your previous question. I mean, again, it's better to have data than to have no data. And obviously all of us know that these platforms are heavily biased towards certain specifics of demographics. And obviously certain platforms, certain social media platforms, are more popular amongst, for example, younger people than actually others. And obviously, also, over time there are changes in popularity of these services. And all of this is not really helping us, so to say, in order to draw reliable conclusions throughout time.

Skip to 8 minutes and 59 secondsAnd so the main advantage, or the main aspect, where we can draw on in order to take this as an advantage is basically to use real world data, where we can actually calibrate what we are seeing on one of these social media platforms, or across a number of these social media platforms. And an example for this is actually, the new flu trends case, where we have seen that over time, for example, behaviour is changing, and there is a certain type of recalibration needed in terms of the adaptiveness we have discussed last week.

Skip to 9 minutes and 41 secondsAnd so the availability of real world data allows us, then, to calibrate what we are seeing in the social media space, sphere, and so this, then, gives us a certain reliability over time to basically look at the differences, and also to specifically address and analyse which biases we are seeing on these platforms. And that's something which makes this, then, really interesting and fascinating. We are no longer limited by the biases as soon as we compare them with what is going on in the real world right now. And we have seen a number of cases and problem areas where this is really important.

Skip to 10 minutes and 23 secondsFor example, when real world measurements are delayed, and basically social media data, which is much quicker and more easily accessible than other data sources, might actually be able to fill in the gap. OK. So people were really concerned about the ethics regarding some of this research, particularly the Facebook paper. So do you think Facebook should be allowed to experiment with this like they did? I mean, what's your take? I mean, where do you draw the line between what's a good ethical use of big data, and one that's harmful? I think, again, this is an incredibly important, but an incredibly difficult, question.

Skip to 11 minutes and 4 secondsSo I think, in line with the discussions you've been having this week, something that has to be taken into account with the Facebook study is that we all know that online companies are frequently changing what we see on their websites with the intention of changing our behaviour. So for example, changing the layout of products that are shown to us with the hope that we'll spend more, or simply hoping that we'll spend more time on their websites, because that will give them options to show us more adverts, for example, and a number of other advantages. Now, so the question is, what's OK, and what's not?

Skip to 11 minutes and 48 secondsNow, we saw there was this huge reaction to the Facebook study that was done in the name of science, whereas we're not generally seeing the same sort of outrage to what we know companies are doing on an everyday basis. And I think this brings up some interesting points. So, in my experience at least, if you talk to people about what they'd be happy for their data to be used for, then people tend to be much more sympathetic towards uses of data which will support society, for example. So there'll be some result of social good. Or uses of data that will advance our understanding of the world around us, so scientific purposes. If you ask them, OK, what's OK, what's not?

Skip to 12 minutes and 41 secondsIf you put in businesses making money from their data, then generally people will react less positively to that. Now, however, if we look at what actually happens when people are using websites, obviously the flip side to businesses making money from data is that businesses are offering services that are incredibly useful to us. So for example, Google making search results available to us, or Facebook making it easier to interact with our friends. And so it appears, from a purely anecdotal observation of how people are behaving at the moment, that that very concrete benefit, very, very quick and concrete benefit that you get from working with these websites, is very important to people.

Skip to 13 minutes and 27 secondsNow, we know you could just decide not to use Google, not to use Facebook, but the cost that you'd pay is very obvious. You wouldn't have access to the search results from Google. You wouldn't be able to liaise with your friends. So it seems that people really value those immediate costs, whereas when, or rather, immediate benefits. When people see benefits which are further away, so, perhaps, for society or for research, their attention is drawn more to the fact that their data is being used in a way that they, perhaps, didn't originally expect. And generally we see a bad reaction to that.

Skip to 14 minutes and 5 secondsSo I suppose that is an attempt at summarising, in a very anecdotal fashion, my observations from talking to people around these subjects. And I suppose the reason I'm bringing all of this up is that I think my answer to this is that we as a society need to make some decisions about what we think is an ethically acceptable use of data, and what is not.

Skip to 14 minutes and 31 secondsBut I can see that establishing what society is really happy with, and what society is not happy with, is going to be a tricky process because certainly, in my experience to date, there seems to be a discrepancy between what people say they'd be happy with, and then how people actually proceed in their usage of these services. So that on it's own already is, and certainly in the future, is going to continue to be a very important topic of research, because we do need some answers to these questions, so that we can make decisions about how to proceed in this new world of large data sources. Great. So thank you for all your answers. So it's been another good week.

Skip to 15 minutes and 13 secondsSo what's coming up next week? OK, so next week, we're going to be looking at how we can use new forms of data to help us understand, and perhaps react better, to disasters, such as earthquakes. We'll be talking about one example of an earthquake. But we'll also be examining how the data we leave behind can give us a better idea of where people are and how they move around. And that will also be linking into the use case of disasters. So we'll be looking at how that sort of information on where people are and how they move around might help us in emergency situations. Great. So looking forward to it, and I'll see you next week, then. Thank you. Brilliant.

Skip to 16 minutes and 6 secondsThanks, Chanuki. See you next week. Thank you. See you next week.

Week 7 round-up

In Week 7, we began to explore how big data might help us measure and improve our happiness.

Here’s a brief summary to help you prepare for Week 8.

You learned how George MacKerron created a smartphone app, Mappiness, to find out where and when people are happy all around the UK. You heard about a Facebook study which investigated whether emotions are contagious, by manipulating what Facebook users saw on their news feed. Thore Graepel also demonstrated that what we “like” on Facebook might give away all sorts of information about our personality, from how intelligent we are, to how satisfied we are with our lives.

These studies again raised important issues about privacy and the ethics of big data. It was great to read all your comments on where you thought the line should be drawn between what is acceptable for government and businesses to do and what you thought may be a step too far.

Finally, you started analysing Google Trends data in R and RStudio. You calculated the Future Orientation Index for the UK in 2012 yourselves. Well done!

This week, we move on to understanding how big data can help us measure where people are. Have a great week!

Share this video:

This video is from the free online course:

Big Data: Measuring and Predicting Human Behaviour

The University of Warwick

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: