Learn more about this course.

Derivation of linear regression

In this article, we walk through the derivation of the model parameter estimator of linear regression.

One great advantage of OLS is that it yields an analytic formula for optimal model parameters. Let (hat{theta}) denote the estimator for linear coefficients (theta) of the linear regression model. By the matrix computation, one may easily obtain the following formulae for (hat{theta}) in the following lemma.

Lemma 1 (OLS estimator) Let (mathcal{D} = (X, Y)) denote the collection of input-output pairs. Assume the standard setting of the OLS model holds and the existence of inverse of (X^{T}X). Then the estimator of the optimal parameter (hat{theta}) is given as follows:

[hat{theta} = (X^{T}X)^{-1}X^{T}Y.]

Proof: Recall that the loss function of OLS yields that

Want to keep
learning?

This content is taken from
UCL (University College London) online course,

An Introduction to Machine Learning in Quantitative Finance

View Course

[L(theta vert X, Y)= (Y- X theta)^{T} (Y- {Xtheta}) = Y^{T}Y – 2 theta^TX^{T}Y + theta^{T}X^{T}X theta]

We use the fact that (Y^{T} Xtheta = (Xtheta)^{T} Y) as both sides are scalars and the transpose of a scalar remains unchanged. It is noted that the loss function of OLS is a quadratic function with respect to the parameter (theta), and it is thus a convex function which ensures that the uniqueness and existence of the global minimum of the optimal parameters. Besides (L(thetavert X, Y)) is differentiable with respect to (theta) and thus the optimal parameter (hat{theta}) should satisfy that the derivative of (L(theta vert X, Y)) evaluated at (theta = hat{theta}) is equal to zero, i.e.

[frac{partial L(theta vert X, Y)}{partial theta}vert_{theta = hat{theta}} = 0.]

As a consequence, the following equation holds

[frac{partial L(theta vert X, Y)}{partial theta} = – 2 X^{T} Y + 2 X^{T}X theta.]

By setting (frac{partial L(theta vert X, Y)}{partial theta}) to zero, we have an linear equation system for (hat{theta}), i.e.

[-2 X^{T} Y + 2 X^{T}X hat{theta} = 0.]

By assumption that (X^{T}X) is invertible, the above equation implies that

(hat{theta} = (X^{T}X)^{-1}X^{T}Y.) (square)

Want to keep learning?

This content is taken from UCL (University College London) online course

An Introduction to Machine Learning in Quantitative Finance

View Course

See other articles from this course

This article is from the free online

An Introduction to Machine Learning in Quantitative Finance

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Derivation of linear regression

Want to keep
learning?

An Introduction to Machine Learning in Quantitative Finance

Want to keep learning?

An Introduction to Machine Learning in Quantitative Finance

An Introduction to Machine Learning in Quantitative Finance

An Introduction to Machine Learning in Quantitative Finance

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Derivation of linear regression

Want to keep learning?

An Introduction to Machine Learning in Quantitative Finance

Want to keep learning?

An Introduction to Machine Learning in Quantitative Finance

Share this

An Introduction to Machine Learning in Quantitative Finance

An Introduction to Machine Learning in Quantitative Finance

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?