Skip main navigation

# Summary of Week 2

In this article, let us recap what is covered this week: supervised learning; linear regression (with regularization) and classification.

This week, we have introduced a general framework of supervised learning and focused on linear regression.

We start with linear regression as an example to go through the key components of the supersized learning framework, i.e. data, model, loss function, optimization, prediction and evaluation. We then proceed with discussing the pros and cons of linear regression. We introduce the three main regularization methods to overcome the potential overfitting issue. Lastly, we extend the regression framework to tackle the classification problems.

## Supervised Learning Framework

• Dataset:(mathcal{D} = {(x_{i}, y_{i})}_{i = 1}^{N})
• Model: ({color{blue}f_{theta}}(x) approx mathbb{E}[y vert x] = f(x),~forall x in mathbb{R}^{d})
• Empirical Loss: (L({color{blue}theta} vert mathcal{D}) = frac{1}{N}sum_{i =1}^{N}d(f_{theta}(x_{i}), y_{i}) rightarrow) Minimize
• Optimization: ({color{blue}theta^{*}} = argmin_{theta} (L({color{blue}theta} vert mathcal{D})))
• Prediction: (hat{y}_{*} = {color{blue}f_{theta^{*}}}(x_{*}))
• Validation: Compute the indicators for the goodness of fitting.

## Linear Regression and Regularization

Linear Regression (Ordinary Least Squares)

• (text{Model:} y = f_{color{blue}theta}(x) + varepsilon = x theta + varepsilon.)
• ({text{Loss Function: }}L({color{blue}theta} vert X, Y) = (Y- X {color{blue}theta})^{T} (Y- {Xcolor{blue}theta}) rightarrow min;)
• ({text{Optimization:}}{color{blue}{hat{theta}}} = (X^{T}X)^{-1}X^{T}y.)
• (text{Prediction:} hat{y}_{*} = x_{*}{color{blue}{hat{theta}}}.)
• (text{Validation:}text{Compute RMSE, }R^{2}, text{ the adjusted }R^{2}, ptext{-value}.)

Regularization – Lasso Regression: (L(beta vert X, Y) = (Y- Xbeta)^{T} (Y- Xbeta) + lambda vertvert beta vert vert_{1} 😉 – Ridge Regression: (L(beta vert X, Y) = (Y- Xbeta)^{T} (Y- Xbeta) + lambda vertvert beta vertvert_{2}^{2};) – Elastic Net: (L(beta vert X, Y) = (Y- Xbeta)^{T} (Y- Xbeta) + lambda left( frac{1-alpha}{2}vertvert beta vertvert_{2}^{2}+ alpha vert vert beta vertvert_{1}right);)

## From Regression to Classification

Regression Classification
Dataset: (mathcal{D} = {(x_{i}, y_{i})}_{i = 1}^{N}, (x_i, y_i) in mathcal{X} times mathcal{Y})
(color{blue}{mathcal{Y} = mathbb{R}^d}) ({color{blue}{mathcal{Y}}}) is a finite set.
Model: (y = f_{theta}(x) + varepsilon) (f_{theta}(x)(approx {color{blue}{mathbb{E}[y vert x]}})) (f_{theta}(x)(approx {color{blue}{mathbb{P}[y vert x]}}))
Empirical Loss: (L(theta vert mathcal{D})rightarrow) Minimize (e.g. ({color{blue}{text{MSE}}})) (e.g. ({color{blue}{text{Cross entropy}}}))
Optimization: (theta^{*} = argmin_{theta} (L(theta vert mathcal{D})))
Prediction: (hat{y}_{*} = {color{blue}f_{theta^{*}}(x_{*})}.) (hat{y}_{*} = {color{blue}{text{arg}max_{i in mathcal{Y}} f^{i}}_{theta^{*}}(x_{*})}.)
Validation: Compute the test metrics e.g.({color{blue}{text{MSE}}}). e.g. ({color{blue}{text{Accuracy}}}).

Table: Summary of the framework of regression and classification. The difference between them is highlighted in ({color{blue}{blue}}).

This article is from the free online

Created by

## Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now