5.6

## Darden School of Business, University of Virginia

Skip to 0 minutes and 1 second[NOISE], that was a lot. The first time you see a regression output, you're going to have a sticker shock. It's going to take you by surprise that this is a lot. But as you keep seeing it, you're going to get used to it. So go back, watch the video on the regression output multiple times. You'll get better at it. But let me remind you, squash is not going away this module. It's staying right here, and maybe I'll bring another squash and put up here. But this is where you really work hard and you gain a lot, because this regression is used a lot. And a good understanding of this is going to pay a lot of dividends.

Skip to 0 minutes and 43 secondsNow, we looked at one variable promotion. Now you can be thinking, wait a minute, this guy must be kidding. Only promotion affects sales? There must be a lot of things that are affecting sales. How can it be only promotion? How will I know all of those variables? How can I put them all together? The good news is, yes, you can put all those variables that you think will affect sales into a regression function. That's the beauty of a regression function. But when you do that, it is important to know what you put into the regression function. But also what you did not put in the regression function, right?

Skip to 1 minute and 24 secondsThat is the key here and that's what we're going to see what it means. So the way to understand that is to look at some Simulated Shopper Card Data.

Skip to 1 minute and 38 secondsWhat I mean by simulated is this is data that I made up, right? So I am making up this data on units purchased, but I'm making this data up methodically, right? I'm using an equation like this, a + b1 times price paid, b2 times feature, b3 times display. And feature and display are variables that are either 1, if a product was on Feature and Display and 0 otherwise. Price is, you all know what price is, and we have units purchased here. The units purchased could be anything from three, two, one, zero. This product was not on feature, so in this first row for customer 1, price was \$1.50. No feature, no display.

Skip to 2 minutes and 27 secondsWe observed unit sales of 3 and you got that by plugging in this equation. I'm not going to give you the values I used for a, b1, b2, b3 yet. That's what a regression function will tell you. And if I made up this data, if I run a regression on this data, I should be able to get a, b1, b2, b3, all right? So think of this as the god equation, okay? I played God. I said, here we go. Units purchased are from now on going to be a + b1 times price paid + b2 feature + b3 display. There's a little bit of randomness. I want to keep people on their toes.

Skip to 3 minutes and 15 secondsThat's it, and I played God, and I said, this is a, b1, b2, b3. Now regression is now going to find out what that was. That's what the beauty of regression is. In real world, this is what is happening, right? Marketers set feature, display and price and then they go and see how people react. And then they have to put together this equation back and say, on average, this is how people react. This is the weight they give for price paid. This is the weight they give for feature. This is the weight they give for display. So that's what we're going to do here, okay? So let's see what happened here.

Skip to 3 minutes and 52 secondsSo we take all this data, and by some freak of nature, you're so smart, you said, the only way price units change are with three variables, price paid, feature and display. And you knew that and you ran the regression. What happens here? That's here, where you are the smart guy who found the true model and you were able to find the god equation.

Skip to 4 minutes and 17 secondsSo, you ran this regression, you found these coefficients, my God! They're all the same that I put in there. Intercept was 6.28, price was -2.31, feature, display, they're all good. Wow, you are awesome. You got an R-squared of 93%, that's unimaginable! How did you do that? Well, you were just smart, you found out all of those things. Now assume you are not a genius. You are not a Nobel Prize winner, or someone like that. And then you said, well, make a guess. And you said units sold is a function of feature and display. I see these nice things in the store. That's what is driving sales. And I'm going to put only those two.

Skip to 5 minutes and 10 secondsI don't know if price was in the model. I'm not going to put it. You forgot to put it and you ran this regression, and that is the estimated model. What's the difference? Look at the intercept, it's lower, whoa! Feature is higher. Display is higher. R-Squared is only 18%, which is okay, because price paid is not there. But why are the coefficients of feature and display different between the True Model and Estimated Model? You didn't include price, that's fine. But shouldn't you get the same coefficient for feature and display if you did or did not include price? Why are the coefficients different?