## Econometrics and the art of putting the rabbit in the hat

from **Lars Syll **

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come in to the picture.

The assumption of imaginary “superpopulations” is one of the many dubious assumptions used in modern econometrics, and as Clint Ballinger has highlighted, this is a particularly questionable rabbit pulling assumption:

Inferential statistics are based on taking a random sample from a larger population … and attempting to draw conclusions about a) the larger population from that data and b) the probability that the relations between measured variables are consistent or are artifacts of the sampling procedure.

However, in political science, economics, development studies and related fields the data often represents as complete an amount of data as can be measured from the real world (an ‘apparent population’). It is not the result of a random sampling from a larger population. Nevertheless, social scientists treat such data as the result of random sampling.

Because there is no source of further cases a fiction is propagated—the data is treated as if it were from a larger population, a ‘superpopulation’ where repeated realizations of the data are imagined. Imagine there could be more worlds with more cases and the problem is fixed …

What ‘draw’ from this imaginary superpopulation does the real-world set of cases we have in hand represent? This is simply an unanswerable question. The current set of cases could be representative of the superpopulation, and it could be an extremely unrepresentative sample, a one in a million chance selection from it …

The problem is not one of statistics that need to be fixed. Rather, it is a problem of the misapplication of inferential statistics to non-inferential situations.

As social scientists – and economists – we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting Haavelmo’s domain of probability theory and sample space of infinite populations – just as Fisher’s “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’s “collective” or Gibbs’s ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

As David Salsburg once noted – in his lovely *The Lady Tasting Tea *– on probability theory:

[W]e assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify [this] abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.

Just as e. g. John Maynard Keynes and Nicholas Georgescu-Roegen, Salsburg is very critical of the way social scientists – including economists and econometricians – uncritically and without arguments have come to simply assume that one can apply probability distributions from statistical theory on their own area of research:

Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

This importantly also means that if you cannot show that data satisfies *all* the conditions of the probabilistic nomological machine – including e. g. the distribution of the deviations corresponding to a normal curve – then the statistical inferences used, lack sound foundations.

In his great book *Statistical Models and Causal Inference: A Dialogue with the Social Sciences* David Freedman also touched on these fundamental problems, arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels (emphasis added):

Lurking behind the typical regression model will be found a host of such assumptions; without them, legitimate inferences cannot be drawn from the model. There are statistical procedures for testing some of these assumptions. However, the tests often lack the power to detect substantial failures. Furthermore, model testing may become circular; breakdowns in assumptions are detected, and the model is redefined to accommodate. In short,

hiding the problems can become a major goal of model building.Using models to make predictions of the future, or the results of interventions, would be a valuable corrective. Testing the model on a variety of data sets – rather than fitting refinements over and over again to the same data set – might be a good second-best … Built into the equation is a model for non-discriminatory behavior: the coefficient d vanishes. If the company discriminates, that part of the model cannot be validated at all.

Regression models are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals.

However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions. Under the circumstances, reliance on model outputs may be quite unjustified. Making the ideas of validation somewhat more precise is a serious problem in the philosophy of science. That models should correspond to reality is, after all, a useful but not totally straightforward idea – with some history to it. Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious …In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And

an enormous amount of fiction has been produced, masquerading as rigorous science.

And as if this wasn’t enough, one could — as we’ve seen — also seriously wonder what kind of “populations” these statistical and econometric models ultimately are based on. Why should we as social scientists – and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems – unquestioningly accept Haavelmo’s “infinite population”, Fisher’s “hypothetical infinite population”, von Mises’s “collective” or Gibbs’s ”ensemble”?

Of course one could treat our observational or experimental data as random samples from real populations. I have no problem with that. But probabilistic econometrics does not content itself with that kind of populations. Instead it creates imaginary populations of “parallel universes” and assume that our data are random samples from that kind of populations.

But this is actually nothing else but hand-waving! And it is inadequate for real science. As David Freedman writes in *Statistical Models and Causal Inference *(emphasis added):

With this approach, the investigator does not explicitly define a population that could in principle be studied, with unlimited resources of time and money. The investigator merely

assumesthat such a population exists in some ill-defined sense. And there is a further assumption, that the data set being analyzed can be treatedas ifit were based on a random sample from the assumed population.These are convenient fictions… Nevertheless, reliance on imaginary populations is widespread. Indeed regression models are commonly used to analyze convenience samples …The rhetoric of imaginary populations is seductive because it seems to free the investigator from the necessity of understanding how data were generated.

**In social sciences — including economics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …**

all the above may be true but are not necessary to make econometrics of macroeconomic event irrelevant. A I have writtn so many time, as long as the data reflect a time series collection , I,e,, observations at different dates in calendar time, then even if probability distribution I analysis is relevant, the system generating this macrodata will e nonergodic!

Hence any econometric analysis may have some descriptive revance as to indicating the past path of the system, it is not scientifically possible to use this study to forecast (predict) the future.

It is as if R.A. Fisher had used different ladies tasting tea during his frequency analysis.

for some micro demand and supply analysis econometric data may be relevant — if the demand (or supply) curve can be assumed constant over time and we have enough information on other variables that can affect the demand for the product–we can fnd, for example, demand or supply price elasticities.

I think the the term “econocubism” (or “econocubisme”) may be useful here. There may well be a Braque, a Picasso, a Metzinger or a Gleize of econometric analysis, but for most practitioners it is a mannerism that alludes, clumsily even, to a technique. The “joke” about cubist painting that circulated in popular satire in the pre-war (W.W. I) days was centred around “Maistre Cube”, a pun that simultaneously referred to the painter as “cube master” and as “cubic metre.”

One problem is that econometric analysis is so “incomprehensible” that it has never been subjected to the same degree of popular suspicion and ridicule as have fashions (along with alleged hoaxes and mystifications) in modern art.

You really should read the Arjo Klamer article about economics as a kind of modernism: http://books.google.nl/books?hl=nl&lr=&id=pJL7enP36l8C&oi=fnd&pg=PA223&dq=Arjo+Klamer+modernism+beyond+physics+1993&ots=k1cVA_BZRt&sig=bYfeYXm7Avs-_GrF309e61BjChs#v=onepage&q=Arjo%20Klamer%20modernism%20beyond%20physics%201993&f=true

Thanks! I will.

Follow up: J. M. Clark 1913 review of Pigou’s Wealth and Welfare:

“This is not the only passage, nor the most extreme, that suggests the caption, ‘Cubist portrait of an economic man,’ or where one feels that time has been spent in elaborating doubtful a priori explanations for undisputed and very interesting facts.”