Home > Uncategorized > Econometrics and the rabbits principle

## Econometrics and the rabbits principle

from Lars Syll

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes causal knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come in to the picture.

The assumption of imaginary ‘superpopulations’ is one of the many dubious assumptions used in modern econometrics, and as Clint Ballinger has highlighted, this is a particularly questionable rabbit pulling assumption:

Inferential statistics are based on taking a random sample from a larger population … and attempting to draw conclusions about a) the larger population from that data and b) the probability that the relations between measured variables are consistent or are artifacts of the sampling procedure.

However, in political science, economics, development studies and related fields the data often represents as complete an amount of data as can be measured from the real world (an ‘apparent population’). It is not the result of a random sampling from a larger population. Nevertheless, social scientists treat such data as the result of random sampling.

Because there is no source of further cases a fiction is propagated—the data is treated as if it were from a larger population, a ‘superpopulation’ where repeated realizations of the data are imagined. Imagine there could be more worlds with more cases and the problem is fixed …

What ‘draw’ from this imaginary superpopulation does the real-world set of cases we have in hand represent? This is simply an unanswerable question. The current set of cases could be representative of the superpopulation, and it could be an extremely unrepresentative sample, a one in a million chance selection from it …

The problem is not one of statistics that need to be fixed. Rather, it is a problem of the misapplication of inferential statistics to non-inferential situations.

1. November 5, 2016 at 10:38 am

Correct on all accounts. One more issue is the assumption of normal distribution of the “population” and the “sample.” Most times it’s impossible to accurately determine either. This means the very basis of “probability analysis” underlying econometrics cannot be verified. Makes the entire effort impossible to assess. There is mathematics that can operate based on non-normal distributions. Issue is we’re not able to determine when the distributions are normal and when they are not.

2. November 5, 2016 at 11:42 am

“The problem is not one of statistics that need to be fixed. Rather, it is a problem of the misapplication of inferential statistics to non-inferential situations.”

Yes. As I put it in light of my research into the reliability of statistical reliability and object-oriented programming, the statistics are indicators rather than measures, much as indexes indicate the content of a book.

The reliability investigations were interesting. Using the null hypothesis that the failure rate of a system was the sum of the average failure rates of the components, the statistics of actual component failures in the field were usually insignificant, but when not, usually highly significant: this being traceable to failures in the system design.

So when inferences consistently fail, this indicates failures of the system rather than the components: in economic jargon, that such failures are endogenous rather than exogenous.

3. November 6, 2016 at 2:33 am

Inferential statistics is a “closed system,” to use the term of art. Supposedly it can be employed in research only within certain boundaries, and in no other way. Beyond those uses inferential statistics if just equations. And without the right setting equations are useless and time wasting. Most social scientists exceed those boundaries in the use of inferential statistics. So, result … they’re just playing with equations. Not much knowledge acquired.