# Model evaluation metrics

A quick reference guide to model evaluation metrics.
In the previous video we saw some of the model evaluation metrics and measures used to evaluate classification models.

In this article we will give a quick reminder of those metrics for future reference.

## Accuracy

This is just the proportion of ‘corrrect’ responses within the entire test set:

[accuracy = frac{correct}{total}]

The higher the better, but this depends on the number of classes, and also the balance within those classes.

## Confusion matrices

For a binary classifier (yes or no answer) these are effectively a table showing so-called true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN) as follows:

 actual yes actual no predicted yes TP FP predicted no FN TN

## Precision

Precision asks – out of all the positive predictions we made, how many of those are correct?

[Precision = frac{TP}{TP+FP}]

## Recall (aka sensitivity)

In contrast, recall asks – how many out of all of the positive examples did we pick out correctly?

[Recall = frac{TP}{TP+FN}]

## F1 score

The F1 score aims to combine both precision and recall into a single measure.

[F_1 = 2 left(frac{precision times recall}{precision + recall}right)]

In mathematical terms, this is known as the harmonic mean of precision and recall, and is bounded between 0 and 1.

## Other F scores

The F1 score is one of a family of measures:

[F_{beta} = (1+beta^2)left(frac{precision times recall}{precision + recall}right)]

Where for the F1 score (beta=1), and so on for other values of (beta).

F2 and F0.5 are sometimes used rather than the F1 score, with the F2 score giving slightly more weight to recall than precision, and F0.5 giving slightly less.

Which of these measures is best to use depends on your application and your dataset.