## Judea Pearl on regression and causation

from **Lars Syll**

Judea Pearl was kind enough to send yours truly his article (co-authored with Bryant Chen) Regression and causation: a critical examination of six econometrics textbooks a while ago. It has now been published in* real-world economics review* issue no. 65.

The article addresses two very important questions in the teaching of modern econometrics and its different textbooks – how is causality treated in general, and more specifically, to what extent they use a distinct causal notation.

The authors have for years been part of an extended effort of advancing explicit causal modeling (especially graphical models) in applied sciences, and this is a first examination of to what extent these endeavours have found their way into econometrics textbooks.

Although the text partly is of a rather demanding “technical” nature, I would definitely recommend it for reading, especially for social scientists with an interest in these issues.

Pearl’s seminal contributions to this research field is well-known and indisputable, but on the “taming” and “resolve” of the issues, I however have to admit that — under the influence of especially David Freedman and Nancy Cartwright — I still have some doubts on the reach, especially in terms of “realism” and “relevance,” of these “solutions” for social sciences in general and economics in specific (see here, here, here and here). And with regards to the present article I think that since the distinction between the “interventionist” E[Y|do(X)] and the more traditional “conditional expectationist” E[Y|X] is so crucial for the subsequent argumentation, a more elaborated presentation had been of value, not the least because then the authors could also more fully explain why the first is so important and if/why this (in my, Freedman’s and Cartwright’s view) can be exported from “engineer” contexts where it arguably easily and universally apply, to “socio-economic” contexts where “manipulativity” and “modularity” are not perhaps so universally at hand.

Manipulativity and modularity are not even vaguely possible; Bayes is finally relevant as computationally it is feasable; still modeling is only vaguely indicative of potential relationship not causation in the socio-economic universe.

Okay maybe I’m showing my age and certainly the era in which I learned about statistics, but the courses I took emphasized two things relevant to this article. First, statistics cannot establish causation. That requires more than the mathematics of statistics. Statistics can highlight the need for experiments, case studies, etc. Statistics’ basis is probability theory which allows us to make weaker or stronger probabilistic statements, only. For example, we can say with a probability of error of X or Y that two variables have a relationship. Second, in reference to regression (ANOVA, multiple, non-linear, etc.) in particular, regression is a tool to estimate what the error is in concluding that two variables are related. That’s all regression is useful for. Even the R2 of regression software, that everyone turns to first is just an estimate of error. Regression is about the negative side of assessing relationships between variables. What is the probability our assessment is incorrect? What is the likely error?

Ken, you are indeed showing your age (no offense intended). My reading is that you’ve stumbled into the great debate here between the frequentists and Bayesians. I sympathize as I’m currently trying to adjust my head to Bayesian thinking from a more mainstream stats. Some comments:

– While Bayes has been around for 250 years and in many respects is common sense, calculation of cause and effect networks has only become recently possible with powerful software. If you want a crash course can I suggest Netica which allows a free working download (no expiry time, just a limit on the size of the net).

– The Bayes people are well aware of your concerns. In this regard there is much of historical interest and instruction in this: Korb, K.B., Nicholson, A.E., 2003. Bayesian artificial intelligence. CRC press.

– Bayesian people are very careful with language. They use words/concepts like ‘Belief’ to get away from absolutisms – indeed they are clearly fascinated with uncertainty in an uncertain world.

– If you haven’t come across it have a look at the Monty Hall Problem via Wiki. It is an extremely simple cause and effect problem which illustrates to most people how difficult probability can be to grasp. Its also a great party piece.

Lars says “I think that since the distinction between the “interventionist” E[Y|do(X)] and the more traditional “conditional expectationist” E[Y|X] is so crucial …”.

Absolutely.

p.s. actually long ago an absolute (cardinal) measure of value was found rather than fiat money like gold or pearls. Its tulips (tiny tim).

the non commutativity of course is apparent from the genetics vs morphology (eg d’rchy thompson) classifications (top down (functional) vs bottom up—speaking here of causality or proof theory, as opposed to morality). its likely a(n) (en)tangled bank.

sorry about that.

but you did make a point that maybe the article should really define the ‘do’ or intervention operation vs conditional probability.

pearle sort of dismissed the ruud book (3.4 in his list) , because it doesn’t discuss ‘causality’ much, though he says its consistant—and I agree. It may be he prefers his own notation, but its just a different notation. (look at his papers on his web site—it looks like algebra, but its not. ,aybe its useful in programming which i no longer do. ) Pearle’s ‘do’ operation merely changes a variable into a parameter, in my view. A stochastic system can easily add parameters—boundary, initial, etc. conditions. He emphasizes the idea that his is a structural model. It is true that it is wrong to call the idea a ‘regression line’ but mathematically the structural and regression models are identical. (And, what is more interesting is polynomial regression, piecewsie continuous, splines, wavelets, etc. which are a bit past the linear (eg fourier) case, and probably reducible to it—weirstrauss approximation theorem.)

It can be noted that he also discusses granger causality, which attempts to figure out what is a cause versus a correlation. But it really can’t. (if you see a shadow (in plato’s cave) does that cause the sun to come out? maybe if you are born with 2 heads you’ll know.) (I am not even sure the Wikipedia entry is correct—granger’s papers are clearer). There are always possible ‘3rd causes’ (e.g. see gerhard t’Hooft on arxiv—noble in physics—on his interpretation of quantum mechanics via a local hidden variable model).

as for politics, that was an aside. interestingly among the books i studied on Godel one was edited by jean von Heijenhoort, who was Trotsky’s secretary in mexico (see wikipedia). The other good one is by Martin Davis who runs the FOM list.

Thanks so much for this Lars. I thought I was reading RER as a distraction from my angst to understand Bayes Nets – and lo behold you are referring here to exactly this and one of the godfathers of modern Bayes application Judea Pearl.