Home > Uncategorized > How statistics can be misleading

How statistics can be misleading


from Lars Syll

From a theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities.

To understand causality we always have to relate it to a specific causal structure. Statistical correlations are never enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective, it is obvious that behind observable data there are real structures and mechanisms operating, things that are  — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

Paul Romer

  1. June 5, 2021 at 3:22 pm

    When cigarette smokers were observed to have a 10x risk of lung cancer (1950), there was no “structure”. There was also, at that time, no *known* mechanism. The statistical evidence was correct in indicating a causal relationship, and ensuing research confirmed this and found the mechanism(s). Lesson: statistical associations, if robust (replicated and not due to chance), represent some sort of causal structure – I say “some sort of” because it can be causation in the “wrong” direction or a common cause (omitted variable bias; confounding). That can be informative in leading to the discovery of the causal mechanism. Don’t just dismiss things (in this case statistical or econometric methods) just because you don’t like them.

    Obviously (hopefully, for this audience), to get from statistical association to causation you have to go through a process of causal inference. In that epidemiological example, it was done using Bradford Hill’s “aspects”, which was a list of things to look for, to try and find out what causal relationships underlie any observed robust statistical association. Mechanism (plausibility) was part of that, but many items on the list were purely statistical criteria such as a dose-response relationship, and time order. Also, what is causally plausible can change when new evidence is obtained, as happened with cigarettes and lung cancer: until then, cancer was thought to arise endogenously rather than being caused, at least partly, by outside exposures.

    And sometimes, econometric methods can be used to make an important discovery: I am thinking of the considerable statistical evidence on the employment effects of the minimum wage. The literature is currently uncertain, in that it is divided between those who find no association and those who find a weak association. In a broader perspective, what this has shown is that pessimistic conclusions drawn from a priori economic theory are wrong. The practical benefit of that has been huge.

    By the way, I don’t see the relevance of Simpson’s paradox in this discussion. It is concerned with the composition of the different groups, not with causal mechanisms and the like.

    • June 6, 2021 at 9:31 am

      You write: “statistical associations … can be informative in leading to the discovery of the causal mechanism. Don’t just dismiss things (in this case statistical or econometric methods) just because you don’t like them.” On this, I think we surely agree. When holding my statistics and econometrics classes, I — again and again — emphasise that data and statistics can help us on the way of detecting causally interesting relations/processes/mechanisms. My students certainly do not dismiss statistics. But they do understand that statistics usually do not give the answers to the most interesting social science questions. When we’ve got our statistics right — both descriptively and inferentially — we have just started on the scientific journey. Statistics is a start, not the end, of scientific endeavours to explain things.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.