Skip to 0 minutes and 0 seconds So, how does a regression output look like? We saw in a chart what a regression line looks like. We saw that equation up there and then we were able to make some sense as to how promotions affect sales. Now, how does a regression output look like? Because in practice, you’re going to take this data about promotions and sales and plug it into software that runs regression and spits out some output. Now how do you understand that output? So, fair warning, we’re going to eat some squash, we’‘re going to look at some numbers, a lot of numbers. But the key here is to keep the intuition always with us.
Skip to 0 minutes and 53 seconds Regression is a lot more about intuition and a lot less about numbers. You’ll have to work with numbers, but it really is about intuition, OK. So let’s look at this regression output that we got here. So the key things, the software spits out a lot of numbers at you. What I have done is make it easy enough for you to really hone in on the things that are very important in a practical sense. So let’s look at only the numbers that are shaded in green. So that would be these, these and these. So let’s look at the ones that we are familiar with. So in that graph, we looked at 9.9 which was our intersect. That’s over here, right.
Skip to 1 minute and 44 seconds So intercept is the value of Y when X is zero. In our example, intercept is sales when number of promotions is zero. Now what happens if you increase promotions by one unit, sales increases by 1.42, that is your coefficient right here, right. Now another thing that you need to look at is what is called an R-squared. Now what is R-squared? So, let me go to the next slide and we’ll see what R-Squared is and come back here. So I just drew some stuff up here for you to have a sense of what R-Squared is.
Skip to 2 minutes and 31 seconds So you have promotions on the X-axis, and sales on the Y-axis here, and this chart up in the right here is similar to what we saw earlier. Now, you have promotions, all these dots appear as different data points. And then you have a regression line that almost seems like 45 degree line here. So, what this means is this regression line is able to cover a lot of the data points, it’s able to capture a lot of the data points and explain a lot of the data points. So the R-Squared is high. Why do we know it as high? Because R-Squared is a function that goes between 0 to 100 percent.
Skip to 3 minutes and 17 seconds So think of it as the percent of this variation in this data, percent of the red dots that the regression line is able to explain. So how does R-squared of zero look like? How does a low R-squared looks like? That’s what we have up here, right. So this is a low R-squared scenario. How does that look like? You have dots all over the place. R-squared is a flat regression line. It’s flat like this, so R-Squared is close to zero percent. So what this says is for example, if I were to plot rain in the X-axis and sales in the Y-axis, it’s hard to imagine that rain is going to influence toothpaste purchase, is it?
Skip to 4 minutes and 8 seconds So you’re going to look at data like this, but it makes no real pattern whatsoever. And you’re going to get a regression line that’s just flat, which just says, “Well, regression is not able to explain anything, which tells that R-squared is zero.” Which also is a very good practical purpose here, right, because if R-squared is high, you know promotion affects sales. If R-squared is low, then it means this X variable such as rain is not affecting sales. So what we need to know is how much this R-Squared is, right. So in regression output, it is good to see what the R-Squared is.
Skip to 4 minutes and 50 seconds And in marketing, this is actually a good R-squared, 60 percent is actually a great R-squared because marketing has a lot of factors that influence our consumers to go and buy products in the store, so you’re not really going to see high R-squared. Somewhere like maybe operations or total quality management or engineering, you will see a high R-squared. But in marketing typically, you’re going to see low R-squared because you’re trying to explain human behavior. Humans are a pretty complex thing. And so it’s not easy to explain what they do, for the most part, right. And so, R-squareds are pretty low typically. Now the thing that you need to next understand is this concept of what is this P-value.
Skip to 5 minutes and 35 seconds So for this, you need to understand the difference between sample and population. So the regression that we did is based on a sample of data. It’s based on say data from that store we visited, consumers in only that store for like a month. And we said let’s use that sample to analyze the relationship between promotion and sales. What if we went to another Kroger down the road. What if we went to another store up in Washington DC, or went to California in San Francisco, and then did this analysis, this data collection instead of in June, we did this collection in December. Would we see still the same results? How do we know that?
Skip to 6 minutes and 28 seconds Because in marketing, you are looking at a sample to make inferences about the whole nation and running campaigns for the whole nation. It’s going to be very expensive to collect data about the whole nation about everyone who shops the data, shops for toothpaste and look at the analysis. Even if you do that, what is the guarantee that the relationship that as you observe today is going to hold tomorrow? You don’t know. So you need some kind of confidence that the relationship you observed in the data is the same and it is going to hold in the future or if you go and take another sample or another set of data. That confidence is what this P-value gives you.
Skip to 7 minutes and 13 seconds So think of P-value as confidence in the regression, OK. So what does this 0.00 number? What that means is it is very unlikely that regression will change, right. So it’s a little bit of you know, funkiness going on here, right. It’s something where you’re looking at not confidence that the regression is stable, that what the P-value gives you is a low P-value, that is something that is below say 10 percent. Implies that the regression is going to hold. So you want low P-value to tell you high confidence in the regression, OK. So that’s the thing you need to be very careful about. Low P-value means high confidence in the regression. Now how low is good?
Skip to 8 minutes and 32 seconds Statisticians have normally figured this out and said in normal senses for most of the cases, if the P-value is less than 10 percent, then you can say the intercept is correct, number of promotions is correct. So, if you went to the R-squared equals zero example and say, “I put the rain here as another variable, P-value will probably be greater than 10 percent because rain is not expected to affect sales. Here, we see that because P-value is 0 or less than 0.001, you are able to say that number of promotions, you’re highly confident that you went to another store in another month and collected the same data, you’re highly confident of finding the same relationship again.
Skip to 9 minutes and 30 seconds This gives marketers confidence to take all this information and then say, “OK, if I run a campaign now, my P-value is low, number of promotion seems to be a stable relationship, I can make some decisions on this information now.”
Interpreting Regression Outputs
Learn how to interpret regression outputs from r-squared to p-values!
© Copyright Rector and Visitors of the University of Virginia