## Purdue University

Skip to 0 minutes and 0 secondsTo be successful in this class, you need to understand statistical analysis. Statistical analysis can be quite complex. In this class, you'll only need the basics. You only need to understand how variables or values are different across groups or how they could be associated within groups. Let's talk about correlation, which is the statistical procedure that allows you to tell the two characteristics for a group of individuals are correlated or associated. This can help help you better understand why people do the kinds of things they do. I have here a hypothetical example of a group of 26 individuals of various ages. Now, these individuals were collected, were observed, on Twitter.

Skip to 0 minutes and 55 secondsSo basically, we looked at people who liked a specific Tweet with cat pictures in it. And these individuals were of various ages. And then they liked these Tweets repeatedly, once, twice, three times and so on and so forth. And what we observed here is that as individuals increase in age, as they're older, they like cat pictures more. Now, how strong is the association between these two characteristics, between age and propensity to like cats? Well, it's the highest possible propensity, because we have a very linear and uniform association between the two, 14, 1, 14.2, 2, and so on, and so forth. The statistical procedure that we can use to evaluate the strength of this association is called the Pearson r Correlation.

Skip to 1 minute and 50 secondsAnd it's a pluggable into a spreadsheet like Google Spreadsheets quite easily. All you have to do is type =CORREL, which is correlation, and then give the ranges of the two variables that you wanna compare. And if the association is perfect, meaning that as one value increases the other one increases at the same pace, you'll get a value of one. If the correlation between the two variables is not very good, the value goes down towards zero. If the correlation is inverse, actually the value can go toward minus one. But let's take this example here again, so we have one scenario in which as people age they liked cat pictures and Tweets more and more.

Skip to 2 minutes and 35 secondsAnd let's now take another scenario, this is hypothetical, and let's imagine that the same individuals liked a cat picture and Tweets at a very random rate. People that were 14 liked such Tweets 24 times, while those that were 21 liked it only 6 times. And it between there's a mix, a hodgepodge, of behaviors. As you can see here in the second chart, we do not have a nice linear association, but we have scatter plot, a scatter of relationships. And this scattered nature of relationship is captured by the correlation coefficient very nicely in the sense that now it is a measly 0.1. It's ten times lower than what we had the first time.

Skip to 3 minutes and 26 secondsNot only that, but it's very close, actually, to zero, which means that it is very unlikely that there is an association between age and liking cat pictures. And with this, I hope that I proved to you that it's quite simple to use a powerful but efficient method of statistical analysis and to become a very successful member of our class.

# Understanding Correlation

Correlation is one of the most basic and common statistical procedures. A correlation is a single number that describes the degree of relationship between two variables. Correlations are statistical measures that determine if two characteristics are associated across the members of a specific group or class. They are meant to ascertain certain regular similarities in the data and to provide the background for asking larger, causal questions. The logic of correlation helps you figure out in what context the analysis is applicable and to what effect.

For example, in marketing a specific product, it might be useful to know if there is a predictable relationship between sales and factors such as age, income, or geographic location.

For a walkthrough of this type of analysis, read Correlation.

• How will you use the correlation function in your social media analysis? Let us know what you will compare and what results you anticipate.