Home > Uncategorized > Statistical models for causation — a critical review

Statistical models for causation — a critical review

from Lars Syll

maxresdefCausal inferences can be drawn from nonexperimental data. However, no mechanical rules can be laid down for the activity. Since Hume, that is almost a truism. Instead, causal inference seems to require an enormous investment of skill, intelligence, and hard work. Many convergent lines of evidence must be developed. Natural variation needs to be identified and exploited. Data must be collected. Confounders need to be considered. Alternative explanations have to be exhaustively tested. Before anything else, the right question needs to be framed. Naturally, there is a desire to substitute intellectual capital for labor. That is why investigators try to base causal inference on statistical models. The technology is relatively easy to use, and promises to open a wide variety of ques- tions to the research effort. However, the appearance of methodological rigor can be deceptive. The models themselves demand critical scrutiny. Mathematical equations are used to adjust for confounding and other sources of bias. These equations may appear formidably precise, but they typically derive from many somewhat arbitrary choices. Which variables to enter in the regression? What functional form to use? What assumptions to make about parameters and error terms? These choices are seldom dictated either by data or prior scientific knowledge. That is why judgment is so critical, the opportunity for error so large, and the number of successful applications so limited.

David Freedman 

Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena require theory. Analysis of variation — the foundation of all regression analysis and econometrics — can never in itself reveal how these variations are brought about. First, when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation.

Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in. Statistical — especially the variety based on a Bayesian epistemology — reasoning generally has no room for these kinds of explanatory considerations. The only thing that matters is the probabilistic relation between evidence and hypothesis. That is also one of the main reasons I find abduction — inference to the best explanation — a better description and account of what constitute actual scientific reasoning and inferences.

Some statisticians and data scientists think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like linearity, additivity,  faithfulness or stability is not to give proofs. It’s to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real causality we are searching for is the one existing in the real world around us. If there is no warranted connection between axiomatically derived theorems and the real world, well, then we haven’t really obtained the causation we are looking for.

  1. ghholtham
    March 15, 2020 at 5:33 pm

    ” I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in.”
    No arguing with that. But how does one test the belief? What justifies or warrants the reasons for belief? Induction, abduction, or whatever, are psychological or imaginative processes that produce a hypothesis. Fine. But they cannot validate it. Otherwise you end up saying, the thing that makes me think something also proves it’s true – a wonderful self-sealing mechanism. The belief has to be tested for conformity to other data, which fall within its proposed domain of application but were not used to induce it. Even then you prove only that the hypothesis applies to the sample. That it holds more generally remains a retained hypothesis, believed until refuted. As Popper said, you cannot definitively prove any general statement.

    The “stuff” of macroeconomics is aggregates, things like aggregate consumption or investment or carbon emissions in a locality, often a nation state, for which data are collected. Contrary to Haavelmo and current fashion in economics, relations among these aggregates are often more reliable than those at the individual level. No-one knows what an individual will do without knowing them well, their knowledge and beliefs as well as their material circumstances – data not publicly available. Give an individual a rise and they may give it to charity, put down a deposit on a house, blow it at the bookies or spend it on consumption goods. Give 20 people a rise and you can be fairly sure aggregate consumption will increase. Give two million people a rise and you can be very sure aggregate consumption will increase. Macroeconomics relies on the law of large numbers for the existence of quasi-stable relations in a world of uncertainty. That means its theories are inevitably stochastic. Testing them has to take account of that. If you don’t like statistics, forget about empirical macroeconomics.
    Note that in testing a hypothesis you are not treating actual data as a sample of some broader virtual universe – that is a misunderstanding of the procedure. You are treating each data point as a sample of the time interval or space over which the hypothesis is supposed to hold.

  2. ghholtham
    March 15, 2020 at 11:03 pm

    Lars,
    My position is that hypothesis testing is not the same as random sampling. We have to start with a hypothesis, which in economics is proposed for a certain domain in time and space, since no-one supposes that we have laws so general they apply at all times to all sorts of societies. Your question, as I understand it, is why are we allowed to apply the axioms of classical statistics when testing a hypothesis on a finite sample.
    The starting point is that if the hypothesis is true its effects on a specific variable should be observable over a given period of time within the proposed domain (the issue of how much time is necessary, I leave aside ). The hypothesis is no doubt ceteris paribus and we cannot control all possible confounding effects. The theory may specify some and we use common sense and historical knowledge of our sample period to specify others. Now we know that other, unspecified influences on the dependent variable will be present and will generate fluctuations in the relationship from one observation to another. The assumption is made that if we have an adequate number of observations these unspecified influences will be normally distributed. Our justification for this assumption is that the influences on the dependent variable are likely to be numerous. Then whatever the distribution of each individual influence over the time period, if there are enough of them, the sum of those distributions will tend to the normal by the central limit theorem. I first came across this justification in Ken Wallis’ textbook on Introductory Econometrics: Economics 2, Gray-Mills 1972, p.15.
    Since ex hypothesi those disturbances are the reasons for unexplained variations in the dependent variable, we are allowed to use conventional statistics to estimate confidence intervals and say whether or not the hypothesis holds or not for the sample at a given confidence level.
    You may protest that this is a joint test, not a test of the pure hypothesis at all but of a particular specification of it, including what variables are included and excluded and of functional form. I accept that. Most tests are joint tests in practice.
    While our specification of the hypothesis need not be linear we do have to suppose that unspecified influences making up the errors are separable from the rest of our model and are not correlated with any explanatory variable. Still, we can test for that and respecify if it does not hold. Note, we should also test whether the estimated residuals of our equation are normally distributed. If they are not, we may have a mis-specification and our test cannot be conclusive. It may still be suggestive; if a coefficient that is supposed to be positive turns up strongly negative one is entitled to doubt the hypothesis even if confidence intervals are not well defined!
    If the estimated errors are normally distributed and the coefficient restrictions implied by the hypothesis are observed with high confidence, we can say the hypothesis appears to hold for this sample. If we are just trying to explain a particular episode, that is the end of the matter. If the hypothesis is more general and is asserted to hold for other times and places, I don’t think we disagree – there has to be a causal hypothesis that makes the claim credible. In that case we can say the hypothesis has not been refuted and people can go on maintaining it. I am a good Popperian and would not claim it had been “proven” in general. Moreover it will not have been specified to take account of events like epidemics, meteor strikes or profound changes in technology. Economics is not a theory of everything and the further future is unpredictable.
    With all the caveats, it does seem to me an essential winnowing process. Because what if the hypothesis is rejected? It has either to be abandoned or amended. Its supporters will no doubt propose amendments – additional conditioning variables, a different functional form. That’s fair enough; let’s give it every chance; we might find restricted conditions under which it holds. If we cannot find a specification where it works we should be able to let it go. You may protest that such a test is likely to have low power because people can mine data to find something that appears to work. That’s true too, though out-of-sample testing usually finds that out. But I think we must distinguish mis-use of econometrics from best practice econometrics and not confuse criticism of the former with fundamental questions of methodology. The problem in economics is not a surfeit of testing, in my view, but the fact it is not taken seriously when it does exclude a theory – stable money demand, Hecksher-Ohlin, neutrality of money etc. Economics is insufficiently empirical, I think we agree, so the effort should be to improve testing methods, not dismiss them out of hand.
    Apologies for a rather long reply.

  3. March 17, 2020 at 10:20 am

    Gerry,
    I think we agree on much here (although I still have some doubts about testing the veracity of inherently unobservable error assumptions …).
    The central question is “how do we learn from empirical data”? Testing statistical/econometric models is one way, but we have to remember that the value of testing hinges on our ability to validate the — often unarticulated technical — basic assumptions on which the testing models build. If the model is wrong, the test apparatus simply gives us fictional values. You’re an optimist here obviously, but to me, there is always a strong risk that one puts a blind eye on some of those non-fulfilled technical assumptions that actually makes the testing results — and the inferences we build on them — unwarranted.
    Haavelmo’s probabilistic revolution gave econometricians their basic framework for testing economic hypotheses. It still builds on the assumption that the hypotheses can be treated as hypotheses about (joint) probability distributions and that economic variables can be treated as if pulled out of an urn as a random sample. But as already Koopmans (!) said back in 1937: “[economic variables] are far from being random drawings from any distribution whatever.”
    And yours truly — like Larry Summer, Aris Spanos, et corsortes — still do not find any hard evidence that econometric testing uniquely has been able to “exclude a theory”. As Renzo Orsi once put it: “If one judges the success of the discipline on the basis of its capability of eliminating invalid theories, econometrics has not been very successful.”

  4. ghholtham
    March 17, 2020 at 10:19 pm

    Lars,
    You say: “If one judges the success of the discipline on the basis of its capability of eliminating invalid theories, econometrics has not been very successful.” That is true but why is it true? There is a clear pecking order in economics. Pure theorists are accorded a prestige that is denied to the practitioner. When did you ever hear anyone say: he is a great economics experimentalist of she is a great economics engineer? Yet economics is essentially a discipline where abstract theories are never “true” of a real situation and have to be distorted and adapted to be used at all. Equally their limits have to be tested. These activities enjoy little status. Organizations will seek the advice of a Nobel prize-winner who never left the armchair or the lecture theatre and one watches in amazement as they pontificate about situations that they don’t understand at all.
    The feedback from testing and experience on theoretical developments is woefully small but I do not think that is primarily due to shortcomings in our testing techniques. These exist as we both understand but they are outweighed by ideology, prejudice and a perverse reward system within the economics trade. As previous correspondents have noted, Hendry found it hard to get published when his work challenged cherished theories. That is why, for any shortcomings, I believe economics would be healthier if econometrics had a higher status relative to pure theory. If it did, its own practice would also improve, standards would be higher and we would see less of the sort of pseudo analysis that you correctly criticize.

    In academic economics, everyone wants to be Einstein. No-one wants to be Rutherford. But there is no real advance unless you confront the data as best you can. With respect, I think you concentrate your fire on the mote in the eye of the econometrician, which means ignoring the beam in the eye of much of the “profession”.

    You also say “The central question is “how do we learn from empirical data”? Testing statistical/econometric models is one way,” To me it is the most promising way. What happens in practice is fashions change as the result of some event that actually may not bear on a theory at all. For example, Keynesianism became unfashionable as a result of “stagflation”, which the theory had not predicted. But inflation at full employment had been predicted by Kalecki if not by Keynes himself. Stagflation was induced by a massive terms of trade shock (the oil crises) which had not been foreseen in the theory and did not “disprove” it, though demonstrating some limiting conditions. The change of fashion spawned other theories which are now being questioned in view of financial crises since the late 1990s. This is no way for a mature discipline to proceed. Rather than fashion swings because we react emotionally when some ceteris paribus assumption is violated we are supposed to subject theories to continuous and rigorous testing and amend them accordingly not change our affections in a huff when we are surprised by events.

  5. Ken Zimmerman
    March 27, 2020 at 12:55 pm

    Lars, you write, “Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in.” It is well to separate this narrative from the statistical version of inference and “causation.” But there is in my view an important link you’ve missed. As stated by Charles Sanders Peirce, the “Maximum of Pragmatism” is, “Consider what effects that might conceivably have practical bearings we conceive the object of our conception to have: then, our conception of those effects is the whole of our conceptions of the object.” So, what are the practical consequences of drawing inferences statistically vs. drawing inferences via subject matter experience? First, we miss most of what’s happening. Second, we miss the explanations by those directly involved with the actors and events we’re examining. Third, we miss which approaches to these actors and events, and the uncertainties involved with them have shown themselves successful in the past at “fixing” the failures of these actors and events. Finally, we miss the knowledge needed to attain a specific goal we set for ourselves or is set for us by others. This is the standard working process of every artisan, mechanic, blacksmith, farmer, etc. in the long history of human work and creation for the necessities of everyday life. That anyone could propose statistical modeling might even approach replacing this process is incomprehensible to me. Maybe modern tribalism is replacing human practicality with uncontrolled creativity. Humans seem to create with little concern for any practicality, for any sort of livable future.

  1. No trackbacks yet.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.