Skip main navigation

Regression analysis

Learn about regression analysis

Regression analysis can be a scary word, typically associated with lots of statistical jargon and complex looking formulas; however, it really does not need to be. It is an extremely useful tool for financial and business analysis.

Regression analysis is a quantitative tool that allows us to assess a relationship between a dependent variable (y) and one or more independent variables (x). It allows us to assess and clarify how the dependent variable’s value might change if one of the independent variables is varied.

Some commonly used examples:

  • Trying to determine whether the amount of time spent studying has an effect on the test score you’ll receive (hint, it does). In this case, the test score would be the dependent variable, and the independent variable would be time spent studying.
  • Trying to determine what happens to the price of a stock when the interest rate changes. In this case, the stock price is the dependent variable, and the interest rate is the independent variable.

Remember that the dependent variable is the one that is affected by a change in the independent variable. Another way to think of it is that the dependent variable’s value is determined by the independent variable.

Types of linear regression:

Linear regression explains the linear relationship of the dependent variable (y), with one or more independent (or explanatory) variables (x).

There are two types of linear regression:

  • Simple linear regression
  • Multiple linear regression

For the purpose of our learning, we will focus on simple linear regression. If you’d like to know more about multiple linear regression, watch this refresher from Harvard Business Review.

Watch: (Optional) The refresher: Regression analysis (2:10) [1]

The equation is as follows:

Y = mx + b + error term

Where m is the estimate of the slope coefficient, and b is the y-intercept. This tells you that if there is no x, then y = b, and for every increment x goes up by one, y will go up by m. The error term exists because we are typically using a sample, and independent variables are never perfect predictors. The regression line (as seen below) is simply an estimate based on the available data; thus, the error term gives us an indication as to how certain we are about our regression model. The larger the error term, the more caution we need to proceed with.

Graphics shows the “Method of Least Squares” and the visualisation of the equation: Sigma e subscript i to the power of 2 = Sigma bracket Y subscript I minus Y subscript I close bracket to the power of 2. The scatter plot below shows the predicted value for Y subscript I to be in line with the positive correlation of the rest of the graphic. The plot also shows the observed value for Y subscript I to be equal to X subscript i.

The following video is worth watching, as it will give an overview of the basics of regression analysis:

Watch: (Optional)An introduction to linear regression analysis (5:17) [2]

Now you might ask, how often would I ever use this? Or do I need to apply this formula every time? The answer to the former is a lot; the answer to the latter is no (luckily, we have Excel to do this for us; however, it is important to understand the concepts). Regression analysis is used often in business, and the following article gives a brief topline overview of when a business might use regression analysis.

Read: Application of regression analysis in business [3]

Further, the Harvard Business Review article below, gives a very good overview of regression analysis, with an example and some of the pitfalls of regression analysis and how businesses might use it.

Read: (Optional)A refresher on regression analysis [4]

Regression model analysis

You’ve now got an understanding of the regression model, and what it sets out to achieve. You can begin to think about how we might use a regression formula in a model, and how to interpret model outputs to understand business situations.

The image shows a screenshot of an excel spreadsheet. The sheet shows values for X (1, 2, 12, 7, 4, 12, 4) and Y (8, 3.5, 8, 9, 12, 7, 4). The sheet shows Summary Output too, including Regression Statistics: Multiple R = 0.164233977 (This is the correlation coefficient). R Square = 0.026972799 (This is the coefficient of determination, known as R^2). Adjusted R Square = -0.167632641 (Adjusted R squared). Standard error = 3.161388839 (standard error). Observations = 7 (Our sample size). Next is a table called ANOVA: Regression = 1 (dff), 1.3852459 (SS), 1.3852459 (MS), 0.13860249 (F), 0.72493265 (Significance F). Residual = 5 (dff), 49.971897 (SS), 9.99437939 (MS). Total = 6 (dff), 51.3571.429 (SS). Intercept = 6.717798595 (Coefficients), X Variable 1 = 0.106557377 (Coefficients). Click to enlarge

This is what output from Excel will look like; we’ll see more on that in the next section, but it’s important for us to interpret this first.

Let’s look at the y-intercept (F25) and the x-variable (F26), the intercept is b from our equation above, and the x-variable is just that.

Recall that y = mx + b + error term, thus, with the values above:

Y = 0.010x +6.7

This means that for every increment that x goes up by 1, y will go up by 0.010 and 6.7. If x = 0 then y = 6.7.

The other values of focus are:

  • Multiple R – this is called the correlation coefficient. This will tell you how strong the linear relationship is. Our value of 0.16 means that there is a reasonably weak relationship (1 being the highest value and 0 being the lowest).
  • R squared – this is called the coefficient of determination. If you square root the R squared, you would get the r-multiple (thus the r-multiple squared gives you R squared). It tells you how many of your independent variable points (x-values) fall on the regression line (our equation). In this case, it is 3%, which means that 3% of the variation of y-values around the mean are explained by x-values; thus, 3% of our values fit the model. This is not high and would indicate a weak correlation.
  • Adjusted R squared – this adjusts for the number of variables in the model. We should only use this if we have more than one variable.
  • Standard error – this is the standard error of the regression; the precision that the regression coefficient is measuring

Note: These were ‘made up’ inputs, thus the most likely cause for our extremely weak correlation; however, the example is more about showing what a regression output looks like and how to interpret it, not the ‘validity’ of our model.

Access to Excel

Moving ahead, You will be asked to work on the excel file that we have provided you with; it contains dummy data.

References:

  1. The refresher: Regression analysis [Video]. Harvard Business Review; 2017 Jan 27. Available from: https://hbr.org/video/5299994733001/the-refresher-regression-analysis
  2. An introduction to linear regression analysis [Video]. Statisticfun; 2012 Feb 5. Available from: https://www.youtube.com/watch?v=zPG4NjIkCjc
  3. Ozyasar, H. Application of regression analysis in business [Internet]. Chron: 2019 Mar 5. Available from: https://smallbusiness.chron.com/application-regression-analysis-business-77200.html
  4. Gallo, A. A refresher on regression analysis [Internet]. Harvard Business Review; 2015 Nov 4. Available from: https://hbr.org/2015/11/a-refresher-on-regression-analysis
This article is from the free online

Financial Analysis for Business Decisions: Introduction to Data Analysis Tools and Capital Projects

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now