Home > Uncategorized > On statistics and causality

On statistics and causality

from Lars Syll

Ironically, the need for a theory of causation began to surface at the same time that statistics came into being … This was a critical moment in the history of science. The opportunity to equip causal questions with a language of their own came very close to being realized but was squandered. In the following years, these questions were declared unscientific and went underground. Despite heroic efforts by the geneticist Sewall Wright (1889-1988), causal vocabulary was virtually prohibited for more than half a century. And when you prohibit speech, you prohibit thought and stifle principles, methods, and tools.

Readers do not have to be scientists to witness this prohibition. In Statistics 101, every student learns to chant, “Correlation is not causation.” With good reason! The rooster’s crow is highly correlated with the sunrise; yet it does not cause the sunrise.

Unfortunately, statistics has fetishized this commonsense observation. It tells us that correlation is not causation, but it does not tell us what causation is. In vain will you search the index of a statistics textbook for an entry on “cause.” Students are not allowed to say that X is the cause of Y — only that X and Y are “related” or “associated.”

Statistical reasoning certainly seems paradoxical to most people.

Take for example the well-known Simpson’s paradox.

From a theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities unless you are — miraculously — able to keep constant all other factors that influence the probability of the outcome studied.

To understand causality we always have to relate it to a specific causal structure. Statistical correlations are never enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional statistical/econometric approach toward causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favorable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 males apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Statistical — and econometric — patterns should never be seen as anything else than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are  — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.

Lars P. Syll

  1. July 29, 2022 at 1:20 am

    Now that, I can understand.

  2. Charlie Thomas aKa Cacciato
    July 30, 2022 at 10:41 pm

    I have been a keen follower of Lars Syll. And likely will continue to be.

    As a forester biometrician I never had the idea proposed by Lars. Causality is a scientific physical, chemical, biological, relation demonstrable by the science of those fields. Statistics does provide support for the relationship. A tree grows by energy from the sun, rain, nutrients which are demonstrable their relative contributions can be supported by statistical analysis of growth data. BTW: Simpson’s paradox in forest growth caused a significant disturbance in southern inventory about 1985,

    My objection to economics is often the externalization of contributions by variables that are ignored for reasons of ideology. The economics of the Southern states completely ignored the contribution of slaves. Mississippi was the richest state of all states in the time cotton was king, not because of states’ rights and free markets rather the labor of slaves.

    • rsm
      August 2, 2022 at 10:17 pm

      《A tree grows by energy from the sun, rain, nutrients which are demonstrable their relative contributions can be supported by statistical analysis of growth data.》
      Isn’t the first clause just a model in our heads? And can we then go on to cherry-pick statistics to support this “how could it be otherwise!” model?
      How much of a role does the mycorhizzal network play in tree growth? (See https://wildaboututah.org/nutcrackers-squirrels-farmers-forests/ “So, the fungus helps the tree grow […]”)
      Might information transmitted through the network also cause growth? Does the current model in our heads require us to dismiss such informational causes because that means persuasion or rhetoric might influence tree growth, and that is just crazy according to the current model of tree growth that is socially accepted today?
      Don’t we already have studies showing talking to plants can affect growth?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: