Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Derivation of linear regression

In this article, we walk through the derivation of the model parameter estimator of linear regression.

One great advantage of OLS is that it yields an analytic formula for optimal model parameters. Let (hat{theta}) denote the estimator for linear coefficients (theta) of the linear regression model. By the matrix computation, one may easily obtain the following formulae for (hat{theta}) in the following lemma.

Lemma 1 (OLS estimator) Let (mathcal{D} = (X, Y)) denote the collection of input-output pairs. Assume the standard setting of the OLS model holds and the existence of inverse of (X^{T}X). Then the estimator of the optimal parameter (hat{theta}) is given as follows:

[hat{theta} = (X^{T}X)^{-1}X^{T}Y.]

Proof: Recall that the loss function of OLS yields that

[L(theta vert X, Y)= (Y- X theta)^{T} (Y- {Xtheta}) = Y^{T}Y – 2 theta^TX^{T}Y + theta^{T}X^{T}X theta]

We use the fact that (Y^{T} Xtheta = (Xtheta)^{T} Y) as both sides are scalars and the transpose of a scalar remains unchanged. It is noted that the loss function of OLS is a quadratic function with respect to the parameter (theta), and it is thus a convex function which ensures that the uniqueness and existence of the global minimum of the optimal parameters. Besides (L(thetavert X, Y)) is differentiable with respect to (theta) and thus the optimal parameter (hat{theta}) should satisfy that the derivative of (L(theta vert X, Y)) evaluated at (theta = hat{theta}) is equal to zero, i.e.

[frac{partial L(theta vert X, Y)}{partial theta}vert_{theta = hat{theta}} = 0.]

As a consequence, the following equation holds

[frac{partial L(theta vert X, Y)}{partial theta} = – 2 X^{T} Y + 2 X^{T}X theta.]

By setting (frac{partial L(theta vert X, Y)}{partial theta}) to zero, we have an linear equation system for (hat{theta}), i.e.

[-2 X^{T} Y + 2 X^{T}X hat{theta} = 0.]

By assumption that (X^{T}X) is invertible, the above equation implies that

(hat{theta} = (X^{T}X)^{-1}X^{T}Y.) (square)

This article is from the free online

An Introduction to Machine Learning in Quantitative Finance

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now