Skip main navigation

Application of Linear Regression

When can we use linear regression? In this article, we explain basic concepts of linear regression.
This image visualizes linear regression. On the xy-axis, there is a linear graph. It attempts to minimize the errors.

Linear regression is probably the most simple algorithm that one can perform. It is nothing but combinations of some additions and multiplications, at least in the equation’s look. But linear regression is one of the most useful and basic predictive models.

Linear regression is a specific type of regression where you believe data forms a roughly linear relationship. It is the relationship between the independent variable and the dependent variable. Sometimes, you might have several independents variables. A straight line can define the relationship.

We cannot know the true relationship between the dependent and independent variables. We can only know the estimated relationship between those because we can only explore this using the data we collect. We will get the estimated relationships by making the regression models built using the data we collected. The process of finding the estimated relationship is fitting the regression model and fitting the regression line.

Therefore, a good regression model means that our estimate is close enough to the real unknown relationship.

Then why linear regression is called linear “regression”? We are regressing an estimate line against the mean by minimizing the errors of the estimates. In the graph below, the red points are the data we gathered, while the blue line is the regression line that we had drawn, hoping to be close enough to the real relationship. Let’s talk about a little more detail.

I am sure that you are not intimidated by the linear regression equation. As I’ve mentioned before, it is nothing but a combination of some pluses and multiplications in the look as below: y=a+bx+e

X is the input data, while y is the output data. X is the independent variable, and y is the dependent variable. e denotes errors, which is the target to minimize.

a and b are coefficients. It is the result of fitting the regression model. In other words, finding out what are a and b are our goal of the regression model. As I mentioned before, the regression model will not give you the entirely perfect relationship of x and y but a general relationship. So there will be a difference between y from the original sample but the result calculated from a+bx. And that difference is e.

You will know this difference will depend on what a and b values are. For instance, when a is 1 and b is 1 and X is 100, you will get an estimated value of 1 + 1 * 100 = 101 but when a is 2 and b is 0.5 and X is 100, you will get the estimated value of 2 + 0.5 * 100 = 52. Now let’s assume the y value was 80. You can now see that the error will be different depending on what a and b values are. In the first case, the error is 21, and in the second case, the error is 28. When we only look at this one data, the first case is better regression. However, we have to compare all data points we have for many cases of a and b.

Finally, the final regression model will decide a and b values that cause the minimum sum of the error square.

This article is from the free online

Artificial Intelligence and Machine Learning for Business

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now