## Three suggestions to ‘save’ econometrics

from **Lars Syll**

Reading an applied econometrics paper could leave you with the impression that the economist (or any social science researcher) first formulated a theory, then built an empirical test based on the theory, then tested the theory. But in my experience what generally happens is more like the opposite: with some loose ideas in mind, the econometrician runs a lot of different regressions until they get something that looks plausible,

thentries to fit it into a theory (existing or new) … Statistical theory itself tells us that if you do this for long enough, you will eventually find something plausible by pure chance!This is bad news because as tempting as that final, pristine looking causal effect is, readers have no way of knowing how it was arrived at. There are several ways I’ve seen to guard against this:

(1) Use a multitude of empirical specifications to test the robustness of the causal links, and pick the one with the best predictive power …

(2) Have researchers submit their paper for peer review before they carry out the empirical work, detailing the theory they want to test, why it matters and how they’re going to do it. Reasons for inevitable deviations from the research plan should be explained clearly in an appendix by the authors and (re-)approved by referees.

(3) Insist that the paper be replicated. Firstly, by having the authors submit their data and code and seeing if referees can replicate it (think this is a low bar? Mostempirical research in ‘top’ economics journals can’t even manage it). Secondly — in the truer sense of replication — wait until someone else, with another dataset or method, gets the same findings in at least a qualitative sense. The latter might be too much to ask of researchers for each paper, but it is a good thing to have in mind as a reader before you are convinced by a finding.

All three of these should, in my opinion, be a prerequisite for research that uses econometrics …

Naturally, this would result in a lot more null findings and probably a lot less research. Perhaps it would also result in fewer attempts at papers which attempt to tell the entire story: that is, which go all the way from building a new model to finding (surprise!) that even the most rigorous empirical methods support it.

Good suggestions, but unfortunately there are many more deep problems with econometrics that have to be ‘solved.’

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes causal knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come in to the picture. The assumption of imaginary ‘superpopulations’ is one of the many dubious assumptions used in modern econometrics.

Misapplication of inferential statistics to non-inferential situations is a non-starter for doing proper science. And when choosing which models to use in our analyses, we cannot get around the fact that the evaluation of our hypotheses, explanations, and predictions cannot be made without reference to a specific statistical model or framework. The probabilistic-statistical inferences we make from our samples decisively depends on what population we choose to refer to. The reference class problem shows that there usually are many such populations to choose from, and that the one we choose decides which probabilities we come up with and a fortiori which predictions we make. Not consciously contemplating the relativity effects this choice of ‘nomological-statistical machines’ have, is probably one of the reasons econometricians have a false sense of the amount of uncertainty that really afflicts their models.

As economists and econometricians we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting Haavelmo’s domain of probability theory and sample space of infinite populations – just as Fisher’s ‘hypothetical infinite population,’ von Mises’s ‘collective’ or Gibbs’s ‘ensemble’ – also implies that judgments are made on the basis of observations that are actually never made! Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

Economists — and econometricians — have (uncritically and often without arguments) come to simply assume that one can apply probability distributions from statistical theory on their own area of research. However, there are fundamental problems arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels.

Of course one could arguably treat our observational or experimental data as random samples from real populations. But probabilistic econometrics does not content itself with that kind of populations. Instead it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of populations. But this is actually nothing but hand-waving! Doing econometrics it’s always wise to remember C. S. Peirce’s remark that universes are not as common as peanuts …

These three prescriptions will not work because they do not go to the root of the problem. [1] Econometricians are completely in the dark about causality, and a multitude of specifications within current methodological framework will only yield a multitude of conflicting and contradictory results. This has already been demonstrated in my paper Causal Relations via Econometrics

International Econometric Review, Vol 2, No. 1, p 36-56, April 2010.[2] The incentives of peers are not aligned with the search for truth. Any paper even mildly critical of mainstream ideology threatens the entrenched establishment — they have circled their wagons and are not letting in outsiders.

[3] Since the methodology in use is deeply flawed, replications cannot achieve the desired goal — using flawed methodology, one can replicate flawed results. See my paper Methodological Mistakes and Econometric Consequences

International Econometric Review, Vol 4. Issue 2, p. 99-122, September 2012A Zaman—i find those are both interesting, good papers—they seem to review much of the econometrics literature (eg Granger causality).

At present, I don’t think anyone can ‘prove causality’—-i think the term is ‘overdetermined’.

You can get the same predictions using very different assumptions—eg deterministic vs random fractals.

Or, you can take 100 bags of 100 pennies each, and lay them out so they are 100% H (or Tails) or 50/50, or gaussian. Or you can just flip, them and get same distributions. One can reproduce many distributions in economics (eg income) assuming either a deterministic ‘maximum utility’ approach, or a random ‘entropy’ approach, and it may not be easy (if not impossible) to distinguish the true mechanism.

There are other ‘complementary’ approaches from physics via modern ergodic theory (eg chaos theory, poincare sections, KAM theorems–all very technical, and you get lost in definitions and proofs) which deal with same issues (and even more really basic —older–ones using things like p-values, Gini coefficient, R**2 regression statistic, etc. and likely 100’s or 1000’s of others, some of which i have seen. )

Mathematically i dont see any way out of this—one can use words or equations or both.