2.3

## UNSW Sydney

Skip to 0 minutes and 12 secondsSo Benford's law was discovered and popularised by two American physicists-- Simon Newcomb in 1881, I think, made the first observation of this phenomenon. And then it was intensively studied and analysed by Frank Benford around 1938. And he, in fact, showed that it applies to a huge variety of different situations. So it's about numbers appearing in texts. You're reading some text, and there are certain numbers that appear, certain figures-- populations, data, money, salaries, whatever. You look at the numbers that appear and you consider the first significant digit that is the first leftmost digit in whatever number you're looking at. They observed that the first significant digit is not equally likely to be one of the numbers from 1 to 9.

Skip to 1 minute and 6 secondsIn fact, it's much more skewed towards 1. 1 is much more likely to be the first significant digit in a number that appears in a publication than 9. In fact, it's a dramatic difference discrepancy. So it turns out that the digit 1 occurs a little bit more than 30% of the time. 2, 17% of the time. 3, 12%, 4, 9.7%, down to 9, which only occurs about 4.6% of the time. Why is this? Is this an inverse relationship? It certainly looks like an inverse relationship. But it turns out that it's not an inverse relationship. But people do know where it comes from. It comes from another kind of relationship which is closely related to the inverse relationships.

Skip to 2 minutes and 4 secondsSo it's based on the 1 over x function. So here is y equals 1 over x, just as before, x and y. So it turns out that in order to understand this distribution, we have to associate to this function another function which is involved in areas. So if we go a distance, say, up to x there, we can ask about what the area is under 1 over x from 1 to x. This is a very interesting function. And it turns out that this area is, well, a very special function called log of x. In fact, this is one way of defining, introducing the log function into mathematics.

Skip to 2 minutes and 59 secondsIt's a very good way of introducing the log function intimately associated with our hyperbola, 1 over x. And then it turns out that these numbers are very directly connected with this log function and the fact that we're working in a base 10 system. So we're going to explore the actual relationship in the activities, but this is the basic shape. And it's really remarkable that it works in such a wide variety of situations. In particular, it's being used by law enforcement agencies to detect false documents. So if you have a financial report that you've fudged and just made up the numbers, chances are you have not distributed your numbers correctly according to Benford's Law.

Skip to 3 minutes and 51 secondsSo a computer can quickly check to see whether roughly your first significant digits correspond to this. And if it's out of whack, there's a strong indication that you've been cheating. It's actually a very interesting and powerful tool. Why does it work? Well, there's been various reasons put forward. And I think people have a rough idea, but I wouldn't say that it's completely determined that there's unanimity about why it works. But it's certainly an interesting and remarkable phenomenon, and again, basically ultimately coming down to this fabulous curve, hyperbola of 1 over x.

# Benford's law

In this video we describe Benford’s Law: a curious observation about the unequal distribution of first digits in random numbers. It was first enunciated by Simon Newcomb (also of Newcomb’s Paradox fame) around 1881. It is a quite surprising aspect of the world of numbers that appears in all kinds of data, that you can actually check for yourself. In fact it has also had applications to detecting fraud.

## Connection with the function $\normalsize y=1/x$

Rather surprisingly, the relative frequencies that appear in Benford’s law are intimately connected to the graph of the function $\normalsize y=1/x$, but to understand that we will need to see how logarithms also connect with areas under this graph, and how scaling plays an important role. In fact Benford’s law is really all about scaling: whatever distribution of first digits we have, it ought to be unchanged if we decide to change our units, which are ultimately arbitrary.

## How dilation affects areas

Let’s remind ourselves of how a dilation in either the $\normalsize{x}$ or $\normalsize{y}$ direction changes area. If we take a figure, such as the blue house on the left in the image below, and dilate it by a factor $\normalsize{2}$ in the horizontal direction (to get the red house on the right), then its area is multiplied by $\normalsize{2}$ as well. The reason is that every little square in the grid is transformed into a rectangle of twice the area. Since this is true of all little squares, it is true of more general areas too.

The same rule applies to a stretch of $\normalsize{2}$ in the $\normalsize{y}$-direction: that also multiplies areas by $\normalsize{2}$. Conversely if we dilated by $\normalsize{\frac{1}{2}}$, which is really a shrink, then areas would be also divided by $\normalsize{2}$.

Now let’s stretch our blue house in the horizontal direction by $\normalsize{2}$ and at the same time dilate by $\normalsize{\frac{1}{2}}$ in the vertical direction. This kind of mixed dilation is made of two separate dilations, one of which multiplies areas by $\normalsize{2}$, and the other which divides areas by $\normalsize{2}$. So the total effect is to preserve areas, even though shapes are distorted! ## Scaling and areas under $\normalsize{y=1/x}$

The innocent looking hyperbola $\normalsize{y=1/x}$ has a rather unobvious symmetry: that if we stretch it in the horizontal direction by a factor of $\normalsize{c}$, and simultaneously shrink it in the vertical direction by exactly that same factor of $\normalsize{c}$, then any point on the hyperbola is sent to another point on the hyperbola. But also this kind of mixed dilation preserves areas, as we have just observed.

When we put these two facts together, we will see in the next section that a remarkable series of conclusions arise — leading to the introduction of the logarithm function as recording areas under the hyperbola!