## Simpson’s Paradox

from **Asad Zaman**

Statistics and Econometrics today are done without any essential reference to causality – this is much like try to figure out how birds fly without taking into account their wings. Judea Pearl “The Book of Why” Chapter 2 tells the bizarre story of how the discipline of statistics inflicted causal blindness on itself, with far-reaching effects for all sciences that depend on data. These notes are planned as an accompaniment and detailed explanation of the Pearl, Glymour, & Jewell textbook on Causality: A Primer. The first steps to understand causality involve a detailed analysis of the Simpson’s Paradox. This has been done in the sequence of six posts, which are listed, linked, and summarized below

1-Simpson’s Paradox: Suppose that there are only two departments at Berkeley, and that they have different admit ratios for women. In Humanities 40% of female applicants are admitted, while in Engineering 80% are admitted. What will be the overall admit ratio of women to Berkeley? The overall admit ratio is a weighted average of 40% and 80% where the weights are the proportions of females who apply to the two departments. Similarly, if 20% of male applicants are admitted to Humanities while 60% are admitted to Engineering, then the overall admit ratio is a weighted average of 20% and 60%, with weights depending on the proportion of males which apply to the two departments. This is what lead to the possibility of Simpson’s Paradox. As the numbers have been set up, both Engineering and Humanities favor females, who have much higher admit ratios than male. If males apply mostly to Engineering, then the overall admit ratio for men will be closer to 60%. If females apply mostly to humanities, their overall admit ration will be closer to 40%. So, looking at the overall ratios, it will appear that admissions favor males, who have higher admit ratios. The key question is: which of these comparisons is correct? Does Berkeley discriminate against males, the story told be departmental admit ratios? Or does it discriminate against females, as the overall admit ratios indicate? The main lesson from the analysis in this sequence of posts is that the answer cannot be determined by the numbers. Either answer can be correct, depending on the hidden and unobservable causal structures of the real world which generate the data.

2-Simpson’s Paradox: This post elaborates on Berk’s explanation of the paradox for Berkeley admissions. His explanation can be understood as a causal path diagram where gender affects choice of department. Both gender and choice of department affect the admissions rate. With this causal structure, gender is a confounding variable when it comes to departmental admission ratios. These must be calculated conditionally on gender – that is, separately for men and women. However, departments are NOT a confounding factor when it comes to the effect of gender on admissions rate. Gender affects admissions through two channels – one is a direct affect on admissions ratios, and the second is an indirect effect via choice of department. Female gender affects admission positively via the direct affect which is favorable. However the indirect affect is negative since females choose the more difficult department in larger numbers. The numbers can be set up so that the negative indirect effect overwhelms the positive direct affect, creating the Simpson’s Paradox. But this entire analysis is dependent on a particular causality structure, and different causal structures can lead to entirely different analyses for exactly the same set of numbers. This is the main point of this sequence of posts – to show the hidden and unobservable real world causal structures MUST be considered for meaningful data analysis. Current econometrics and statistics does not pay attention to causality and hence often leads to meaningless analysis.

3-Simpson’s Paradox: This post considers alternative causal structures for Berkeley admissions which lead to conclusions radically different from Berk’s original analysis. We first consider a case where gender affects department choice, while admit ratio depends only on department, and is completely gender neutral. If females choose more difficult departments, there will be a spurious correlation between admit ratios and gender, creating a misleading impression of discrimination against females. A second example is considered where admissions depend purely on SAT scores, and has no relationship to gender or to department. Nonetheless, if gender affects SAT Scores and choice of department, we can replicate the exact same numbers of the original data, which would create the misleading impressions that departments discriminate by gender, and some departments are more difficult to get into than others. In fact, admissions policy is same across departments, and depends only on SAT scores. The point of these analyses is that exactly the same observed data can correspond to radically different causal structures, and lead to radically different conclusions about discrimination with respect to gender.

4-Simpson’s Paradox: Contrary to the perspective taken by conventional statistics texts, and some forms of econometric analysis (VAR models), we cannot do data analysis without knowing where the numbers come from. The jobs of the field expert and the statistical consultant cannot be separated. To illustrate this point, we consider the same data generated for the Berkeley admissions, and consider it as batting averages of two different batters against left and right-handed pitchers. Then the Simpson’s Paradox takes the following form. Frank has higher batting average than Tom against left-handed pitchers and he also has higher batting average than Tom against left-hand pitchers. However, the overall batting average of Tom is higher than that of Frank. As the manager of the team, which one of the two should you send out when it is critical to get an extra hit or two? If we consider left and right handed pitchers separately, Frank is better than Tom for both, and hence we should send Frank. However, overall batting average of Tom is better, suggesting that we should send out Tom. The answer depends on the causal structure. If the choice of pitchers is EXOGENOUS – independent of the batter choice – then Frank is the better choice. If adversary coach looks at the batter to decide on the pitchers, then the choice of pitchers is endogenous, and in this case Tom may be the better choice.

5-Simpson’s Paradox: To further drive home the fact that data analysis cannot be confined to numbers, and divorced from the real world environment which generated the data, we consider a third interpretation of the same data set used for Berkeley admissions. In this interpretation, we look at the effect of a drug on recovery rates from a disease. The Simpson Paradox takes the form that the drug decreases recovery rates in females, and also decreases recovery rates in males. So, it is bad for males and it is bad for females. But when we look at the population as a whole, we find that the drug improves recovery rate. So, the drug is good for the general population. A causal path diagram shows that gender must be exogenous – it cannot be affected by the drug. Thus gender is a confounding variable, we must condition on this variable to get the right measure of the effect of drug on recovery. Thus we conclude that the drug is bad for everyone, and lowers the recovery rate for everyone, even though the overall data tells us otherwise. But now consider the same data set with gender replaced by blood pressure, and suppose that the drug affects blood pressure. Suppose low blood pressure is a positive factor in recovery, while the drug has a toxic-effect so that the direct impact is negative. However, the drug also lowers the blood pressure, which creates a positive factor for recovery. The combined effect can be favorable, and this is what should be considered when administering the drug.

Dear Asad,

With respect to the beginning of your article,

“Statistics and Econometrics today are done without any essential reference to causality…”

I find it very interesting, that this issue had already intensively been discussed by Nobellaureat James Heckman et al.:

“I make two main points that are firmly anchored in the econometric tradition. The first is that causality is a property of a model of hypotheticals. A fully articulated model of the phenomena being studied precisely defines hypothetical or counterfactual states. A definition of causality drops out of a fully articulated model as an automatic by-product. A model is a set of possible counterfactual worlds constructed under some rules. The rules may be the laws of physics, the consequences of utility maximization, or the rules governing

social interactions, to take only three of many possible examples. A model is in the mind. As a consequence, causality is in the mind.

In order to be precise, counterfactual statements must be made within a precisely stated model. Ambiguity in model specification implies ambiguity in the definition of counterfactuals and hence of the notion of causality. The more complete the model of counterfactuals,

the more precise the definition of causality. The ambiguity and controversy surrounding discussions of causal models are consequences of analysts wanting something for nothing: a definition of causality without a clearly articulated model of the phenomenon being described (i.e., a model of counterfactuals). They want to describe a phenomenon as being modeled ‘‘causally’’ without producing a clear model of how the phenomenon being described is generated or what mechanisms select the counterfactuals that are observed

in hypothetical or real samples. In the words of Holland (1986), they want to model the effects of causes without modeling the causes of effects. Science is all about constructing models of the causes of effects. This paper develops the scientific model of causality and

shows its value in analyzing policy problems.

My second main point is that the existing literature on ‘‘causal inference’’ in statistics confuses three distinct tasks that need to be carefully distinguished:

1. Definitions of counterfactuals.

2. Identification of causal models from idealized data of population distributions (infinite

samples without any sampling variation). The hypothetical populations may

be subject to selection bias, attrition and the like. However, all issues of sampling

variability are irrelevant for this problem.

3. Identification of causal models from actual data, where sampling variability is

an issue. This analysis recognizes the difference between empirical distributions

based on sampled data and population distributions generating the data.

[…]

Some of the controversy surrounding counterfactuals and causal models is partly a consequence of analysts being unclear about these three distinct tasks and often confusing solutions to each of them. Some analysts associate particular methods of estimation (e.g.,

matching or instrumental variable estimation) with causal inference and the definition of causal parameters. Such associations confuse the three distinct tasks of definition, identification, and estimation. Each method for estimating causal parameters makes some assumptions and forces certain constraints on the counterfactuals.

Many statisticians are uncomfortable with counterfactuals. Their discomfort arises in part from the need to specify models to interpret and identify counterfactuals. Most statisticians are not trained in science or social science and adopt as their credo that

they ‘‘should stick to the facts.’’”

(Heckman, J. J., The Scientific Model of Causality, Sociological Methodology, Blackwell Publishing Ltd/Inc., 2005, 35, 1-97.)

“The econometric approach to policy evaluation separates these problems and emphasizes

the conditional nature of causal knowledge. Human knowledge advances by

developing counterfactuals and theoretical models and testing them against data. The

models used are inevitably provisional and conditional on a priori assumptions. Blind

empiricism leads nowhere. Economists have economic theory to draw on but recent

developments in the econometric treatment effect literature often ignore it.”

(Heckman, J. J. and Vytlacil, E. J. in Heckman, J. J. & Leamer, E. E.(Eds.), Chapter 70 Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models and Econometric Policy Evaluation, Elsevier, 2007, 6, Part 2, 4779 – 4874.)

“The totality of our so-called knowledge or beliefs, from the most casual matters of geography or history to the profoundest laws of atomic physics . . . is a man made fabric

which impinges on experience only at the edges . . . total science is like a field of force whose boundary conditions are experience . . . A conflict with experience on the periphery occasions readjustments in the interior of the field. Reevaluation of some statements require reevaluation of others, because of their logical interconnections . . . But the total field is so underdetermined by its boundary conditions, experience, that there is much latitude of choice as to what statements to re-evaluate in the light of any single contrary experience.”

(Quine, W.V.O. (1951). “Main trends in recent philosophy: Two dogmas of empiricism”. The Philosophical Review 60 (1), 20–43 (January) as cited by Heckman and Vytlacil, ibid.)

“Unlike the models developed by statisticians, the class of microeconometric models developed to exploit and interpret the new sources of micro data emphasized the role of economics and causal frameworks in interpreting evidence, in establishing causal relationships, and in constructing counterfactuals, …”

(Heckman, J. J., Micro Data, Heterogeneity, and the Evaluation of Public Policy: Nobel Lecture, Journal of Political Economy, The University of Chicago Press, 2001, 109, pp. 673-748.

Heckmann et al discussed this issue very prominently in the mainstream of two very different scientific comunities. As well in the statistical literature:

Heckman, J. J., Econometric Causality, International Statistical Review, Blackwell Publishing Ltd, 2008, 76, 1-27.

as well as in the economic literature:

Heckman, J. J. and Vytlacil, E., Structural Equations, Treatment Effects, and Econometric Policy Evaluation, Econometrica, The Econometric Society, 2005, 73, pp. 669-738.

Summarizing, I would like to emphasize, a while ago there had been a very thorough discussion in mainstream econometrics on the notion of causality, viz. a methodology (in the narrow original sense) of the fundamentals of econometrics. This means, there is a well defined framework relating the concept of causality to statistics resulting in a transparent understanding of the knowledge gained by using this methodology.

I advocate to use this framework unless there is no better one.

I would be interested in your opinion on that.

Cheers,

Lutz

Dear Lutz = it was only after reading Pearl’s work that the massive amount of confusion and headaches caused by the enormously complicated literature on causality cleared up in my mind. You might try starting with the Book of Why. Above statements that you have cited from Heckman are sufficient to convince me that he does not have a clue — Three tasks, which are of increasing levels of complexity (from Book of Why) are: The FIRST step on the ladder of causation is just OBSERVING the world, and getting the correlations right, and separating the spurious and accidental ones from the genuine ones, which persist over time. This step corresponds to the points [2] AND [3] of Heckman you have cited above. The SECOND step is to construct the CAUSAL relationships – these involve figuring out the effects of INTERVENTIONS. Now the AMAZING thing is that this information is NOT CONTAINED in the observable data distributions. Even if you have infinite samples of data with zero sampling variation, you cannot find out the effects of interventions. If you see balls rolling down slopes and calculate speeds across time, you can reconstruct a particular instance of the laws of motion. But you cannot get to the law itself, which is based on the causes of what is observed, for that you would have to make Newtonian level of inference. So the SECOND level CANNOT ever be reduced to the FIRST level. Heckman does not seem be to aware of this. The THIRD level, distinct, different, and more difficult than the first two is the analysis of COUNTERFACTUALS, which depends on the first two, but is different from them. One simply CANNOT get from observations to counterfactuals as Heckman seems to suggest is possible.