Home > Uncategorized > Causal inference from observational data

Causal inference from observational data

from Lars Syll

nisbettResearchers often determine the individual’s contemporary IQ or IQ earlier in life, socioeconomic status of the family of origin, living circumstances when the individual was a child, number of siblings, whether the family had a library card, educational attainment of the individual, and other variables, and put all of them into a multiple-regression equation predicting adult socioeconomic status or income or social pathology or whatever. Researchers then report the magnitude of the contribution of each of the variables in the regression equation, net of all the others (that is, holding constant all the others). It always turns out that IQ, net of all the other variables, is important to outcomes. But … the independent variables pose a tangle of causality – with some causing others in goodness-knows-what ways and some being caused by unknown variables that have not even been measured. Higher socioeconomic status of parents is related to educational attainment of the child, but higher-socioeconomic-status parents have higher IQs, and this affects both the genes that the child has and the emphasis that the parents are likely to place on education and the quality of the parenting with respect to encouragement of intellectual skills and so on. So statements such as “IQ accounts for X percent of the variation in occupational attainment” are built on the shakiest of statistical foundations. What nature hath joined together, multiple regressions cannot put asunder.

Now, I think this is right as far as it goes, although it would certainly have strengthened Nisbett’s argumentation if he had elaborated more on the methodological question around causality, or at least had given some mathematical-statistical-econometric references. Unfortunately, his alternative approach is not more convincing than regression analysis. As so many other contemporary social scientists today, Nisbett seems to think that randomization solves empirical problems. By randomizing we are getting different ‘populations’ that are homogeneous in regards to all variables except the one we think is a genuine cause. In that way we are supposed being able not having to actually know what all these other factors are.

If you succeed in performing an ideal randomization with different treatment groups and control groups that is attainable. But it presupposes that you really have been able to establish — and not just assume — that the probability of all other causes but the putative have the same probability distribution in the treatment and control groups, and that the probability of assignment to treatment or control groups are independent of all other possible causal variables.

Unfortunately, real experiments and real randomizations seldom or never achieve this. So, yes, we may do without knowing all causes, but it takes ideal experiments and ideal randomizations to do that, not real ones. That means that in practice we do have to have sufficient background knowledge to deduce causal knowledge. Without old knowledge, we can’t get new knowledge — and, ‘no causes in, no causes out.’

On the issue of the shortcomings of multiple regression analysis, no one sums it up better than David Freedman:

Layout 1

Regression models often seem to be used to compensate for problems in measurement, data collection, and study design. By the time the models are deployed, the scientific position is nearly hopeless …

Causal inference from observational data presents many difficulties, especially when underlying mechanisms are poorly understood. There is a natural desire to substitute intellectual capital for labor, and an equally natural preference for system and rigor over methods that seem more haphazard. These are possible explanations for the current popularity of statistical models.

Indeed, far-reaching claims have been made for the superiority of a quantitative template that depends on modeling – by those who manage to ignore the far-reaching assumptions behind the models. However, the assumptions often turn out to be unsupported by the data. If so, the rigor of advanced quantitative methods is a matter of appearance rather than substance.

  1. Ken Zimmerman
    June 4, 2021 at 11:32 am

    According to Maxwell in ‘Causal Explanation,Qualitative Research, and Scientific Inquiry in Education’ the “…argument for the pre-eminence of randomized experiments in causal involves a series of linked assumptions and claims about causality and the appropriate methods for investigating this. Specifically, the NRC report: 1. Assumes a regularity view of causation. 2. Privileges a variable-oriented approach to research over a process-oriented approach. 3. Denies the possibility of observing causality in single cases. 4. Neglects the role of context as an essential component of causal explanation. 5. Neglects the importance of meaning for causal explanation in social science. 6. Asserts that qualitative and quantitative research share the same logic of inference. 7. Presents a hierarchical ordering of methods for investigating causality, giving priority to experimental and other quantitative methods.”

    A realist, process-oriented conception of causal explanation entails a quite different approach to understanding particular events or situations. Salmon (1998) states that Hume concludes that it is only by repeatedly observing associated events that we can establish the existence of causal relations. If, in addition to the separate events, a causal connection were observable, it would suffice to observe one case in which the cause, the effect, and the causal relation were present. (p. 15)

    Salmon (1998) then argues that “causal processes are precisely the connections Hume sought, that is, that the relation between a cause and an effect is a physical connection” (p. 16).6 Putnam (1999, pp. 140–141) also claims that we can observe causation, quoting Anscombe: First, as to the statement that we can never observe causality in the individual case. Someone who says this is just not going to count anything as “observation of causality.” This often happens in philosophy; it is argued that “all we find” is such-and-such, and it turns out that the arguer has excluded from his idea of “finding” the sorts of things he says we don’t “find.” (Anscombe, 1971, p. 137, quoted in Putnam, 1999, p. 141) Ducasse (1926) and Davidson (1967) have also argued that causation can be identified in a single case, and Cartwright (2000) claims that regularity approaches cannot provide adequate accounts of causation without presupposing singular causal knowledge.

    Experimental researchers relying exclusively on a regularity model of causation assume, following Hume, that the researcher can’t directly observe causation, and therefore must depend on inferring causal relationships from measured covariation of variables. Qualitative studies that are based on a process approach to causation, in contrast, attempt to directly investigate causal mechanisms. This argument has been made not only by self-conscious realists, but also by researchers with no explicit commitment to this position. For example, Britan (1978) claims that “experimental evaluations relate program treatments to program effects without directly examining causal processes, [while] contextual [qualitative] evaluations investigate causal relationships . . . by directly examining the processes through which results are achieved” (p. 231). And Jankowski (1991), reporting on his 10-year participant observation study of urban gangs in three cities, states that Unable to observe gang violence directly, researchers have treated it as a dependent variable, something to be explained using structural and individual-oriented independent variables. This study . . . seeks to understand the anatomy of violence as well as to explain it. (p. 138) As Becker (1996) notes, “It is invariably epistemologically dangerous to guess at what could be observed directly” (p. 58). The ability of qualitative methods to directly investigate causal processes is a major contribution that this approach can make to scientific inquiry in education. Unfortunately, this ability has not only been denied by most quantitative researchers, but also by many qualitative researchers, and specific methods for identifying and verifying causal processes need further explication and development (Maxwell).

    Realist social researchers place considerable emphasis on the context dependence of causal explanation (e.g., Sayer, 1992, pp. 60–61; Huberman & Miles, 1985, p. 354). Pawson and Tilley (1997) sum up this position in their formula “mechanism + context = outcome” (p. xv). They maintain that “the relationship between causal mechanisms and their effects is not fixed, but contingent” (p. 69); it depends on the context within which the mechanism operates. This is not simply a claim that causal relationships vary across contexts; it is a more fundamental claim, that the context within which a causal process occurs is, to a greater or lesser extent, intrinsically involved in that process, and often cannot be “controlled for” in a variance-theory sense without misrepresenting the causal mechanism (Sayer, 2000, pp. 114–118). Thus, Goldenberg (1992), in a case study of the reading progress of two students and the effects of their teacher’s behavior on this progress, states, “If we see these dimensions as variables divorced from this context, we risk distorting the role they actually play” (p. 540). For the social sciences, the social and cultural contexts of the phenomenon studied are crucial for understanding the operation of causal mechanisms.

  2. Gerald Holtham
    June 5, 2021 at 6:53 pm

    Simple, single-cause, single-effect processes do not generally present much difficulty. The difficulty is establishing cause and effect in complicated situations where effects are compound and there are multiple causes, proximate or permissive. On that situation qualitative analysis of process is just as fraught as statistical methods since the possibility of misinterpretation or generalising from special cases is clear. Ideally one would employ both methods since they are essentially complementary. Of course if you start from the premise that nothing is known unless known with perfect certainty you are sure to conclude that no-one knows anything.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.