Select Page
Correlation-Causation

We’ve talked about how popular the concept of correlation is in business analytics. Causation, and its variants are also used rather commonly. This makes it easy to mistakenly connect the two popular terms and even use them interchangeably. The Latin phrase, cum hoc ergo propter hoc  (“with this, therefore because of this”) refers to the logical fallacy of two events occurring together that are thought to have a cause-and-effect relationship.

Such erroneous conclusions from correlations and causations are in no way limited to business alone. The media draws conclusions from correlations every other day.

Now coffee causes cancer, now it doesn’t

Take a look at this headline:

coffee-cancer-correlation

To give you a bit of a background to this, in 1991, the International Agency for Research on Cancer (IARC) had termed coffee a category 2B carcinogen. In June 2016, they reversed their position and classified coffee in category 3, and stated (that coffee is) “not classifiable as to its carcinogenicity to humans.” In the same report, they added a caveat about hot drinks (above 60-65 Celsius) could cause oesophageal cancer. So now, we know that coffee does not cause cancer, but hot beverages (tea, coffee, soups, or even water) could cause cancer. The two observations do not mean that only hot coffee causes cancer (as the headline above might lead you to believe). Neither does it mean that all hot beverages except coffee cause cancer, as the following headlines would have you believe.

tea-coffee-cancer

tea-coffee-cancer

What does observed correlation indicate, if not causality?

Great question! That correlation in itself does not indicate causality does not imply that correlation can never indicate causality. If there is an observed correlation between different variables, a few different possibilities emerge:

  • A and B are correlated because A causes B, or B causes A
  • A and B are correlated because they are both caused by a third variable C
  • A and B are correlated because A causes X, which, in turn, causes B
  • A and B are correlated by pure chance, perhaps limited to the frame of observation alone

Therefore, when we observe a correlation between two sets of variables, it demands further investigation before we can make some meaningful conclusions out of it. How can we conclude that a causation exists? Experimentally, a full-fledged, double-blind study can conclusively prove causation where some correlation has been observed.

Correlations in medical sciences

Wikipedia states a wonderful example of erroneous causal conclusion from correlation:

“In a widely studied case, numerous epidemiological studies showed that women taking combined hormone replacement therapy (HRT) also had a lower-than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD. But randomized controlled trials showed that HRT caused a small but statistically significant increase in risk of CHD. Re-analysis of the data from the epidemiological studies showed that women undertaking HRT were more likely to be from higher socio-economic groups (ABC1), with better-than-average diet and exercise regimens. The use of HRT and decreased incidence of coronary heart disease were coincident effects of a common cause (i.e. the benefits associated with a higher socioeconomic status), rather than a direct cause and effect, as had been supposed.”

Last year, when Zika virus was spreading panic across the world, some scientists expressed reservations about Zika causing microcephaly (infants being born with birth defects such as tiny heads). The theory was that a mere correlation between the spread of Zika virus and a rise in birth defects could not be used to conclude that Zika caused the birth defects.

zika-birth-defect-pesticide-microcephaly

Later, the World Health Organisation (WHO) released a statement on their “Dispelling rumours” page stating that the pesticide (pyriproxifen) had not been observed to contribute to microcephaly.

who-pesticide-zika-microcephaly

Spurious Correlations

Correlations that are observed for no meaningful reason and are purely due to chance are called spurious correlations. Tyler Vigen’s website lists a bunch of hilarious (and some absolutely ridiculous) spurious correlations – there are some 30,000 of them listed on that site! My favourite one? It has to be this one:

spurious-correlation

Interestingly, Nicholas Cage’s films are highly correlated with a bunch of different stuff.