4.6

# Associations and correlations do not equal causation

Causality is complicated. Everyone knows that smoking causes cancer right? But what does that mean. Does everyone who smokes get cancer? Are there a certain number of cigarettes that will ensure cancer? If you don’t smoke will you not get cancer? What we mean when we say that smoking causes cancer is intrinsically linked with probability. We mean that smoking increases your probability of developing cancer. It turns out that smoking is indeed one of the leading causes of developing cancer.

So what do we mean when we talk about causality? One of the people who demonstrated that smoking caused lung cancer put together a set of criteria for demonstrating causality in 1965, named the Bradford Hill criteria. [1]

These criteria include that the association should be strong (to rule out other explanations), should be able to be demonstrated repeatedly, that the cause should occur before the effect, that increased exposure to the cause should increase the chance of the effect (called a dose-response relationship), should be plausible, should not contradict other knowledge, and should be verified by experiment.

Bradford Hill proposed that none of these conditions were necessary, and even the presence of all would not be sufficient to demonstrate causality.

The situation remains then, that if one thing causes another other, those two will surely be correlated. The reverse is not always the case. Click here for an excellent website that lists spurious correlations. It chooses things that are very unlikely to be causally linked, yet have very high correlations.

When two things appear correlated but their association can be explained by a third factor that is associated with both, we call that third factor a confounder. An example might illustrate this. A naïve scientist observed that the population of storks in their village was increasing at a similar rate to the number of newborn babies in the village. The scientist, being naïve, was convinced that there must be truth to the old legend that storks bring babies. In fact, what was actually happening was that the fertile population of the village was expanding (due to the birth rate) which meant that the village produced more rubbish, which was capable of supporting (and attracting) a larger population of storks who eat the rubbish. In this example, the amount of refuse is the confounder, being linked with both the growing population of the village as well as the increase in stork numbers.

This example too illustrates another issue: reverse causality. When two things are related and are measured at the same time, it is difficult to determine which is the cause and which the effect.

Confounders are everywhere and can make the process of teasing out causality very difficult. This is why the gold standard of research is the randomised controlled trial (RCT). In an RCT prospective participants are assessed for eligibility and then randomised to receive either the experimental treatment or the control treatment. The control treatment can either be a placebo, or the best available treatment. The key here is that any individual can end up in either group. There will be no systemic differences between groups when the randomisation is performed properly. This means that any difference between the groups observed after treatment can be attributed to that treatment. We have ruled out confounders as a plausible alternative explanation.

So when you hear a health claim from the media “X causes Y”, it is important to understand how that conclusion has been reached. Sometimes, a very tightly controlled laboratory experiment might indicate that a certain exposure causes a certain disease, and yet in practice that may not be true. Some further useful reading on questions to ask yourself when you hear a health claim in the media is available here.

[1] Hill AB. The environment and disease: Association or causation? Proceed Roy Soc Medicine – London. 1965;58:295–300