# What is Linear Regression?

Linear regression is the most basic but also commonly-used regression method. Watch this video to know how linear regression works.

Ordinary Least Square (OLS) is the simplest linear regression method with wide applications. OLS assumes the linear relationship between the input and the output and takes the mean squared error as the loss function. It enjoys the analytic formula for the optimal linear coefficient estimator.

## Model assumption

Suppose that we have a collection of the input-output pairs, i.e. (mathcal{D} = (x_{i}, y_{i})_{i = 1}^{N}), where (x_i in mathbb{R}^d) and (y_i in mathbb{R}). OLS assumes that (mathcal{D}) satisfies a linear model as follows:

[y_i = x_itheta + varepsilon_i,]

where (x_i) is a (d)-dimensional row vector and (theta) is a (d)-dimensional column vector, and (varepsilon_i) is a scalar noise term, which is identically and independently distributed with zero mean. (theta) represents the fixed but unknown linear coefficients of the OLS model.

For the case where the linear model includes a non-zero intercept, i.e. (y = theta_{0} + xtheta + varepsilon.label{EqnLM_intercept}) It can be regarded as a linear model without intercept with the lifted input (tilde{x} = (1, x)) by adding constant (1) to an extra coordinate of input variable (x).

## Loss function

As the name of OLS suggests, the loss function of OLS, denoted by (L(theta vert X, Y)), is the sum of the squared of residuals; in formula, (L(theta vert X, Y) = sum_{i = 1}^{N}(y_{i} – x_{i}theta)^{2} = (Y- X theta)^{T} (Y- Xtheta).)

## Optimization

The optimal (hat{theta}) to attain the minimum of (L(theta vert X, Y)) admits the analytic formula, i.e. (hat{theta} = (X^{T}X)^{-1}(X^{T}Y)), provided that (X^{T}X) is invertible.

## Prediction

Once the optimal linear coefficient (hat{theta}) is computed, for any given input (x), the estimator for the corresponding output is (hat{theta}x).