## Statistical inference and sampling assumptions

from** Lars Syll**

Real probability samples have two great benefits: (i) they allow unbiased extrapolation from the sample; (ii) with data internal to the sample, it is possible to estimate how much results are likely to change if another sample is taken. These benefits, of course, have a price: drawing probability samples is hard work. An investigator who assumes that a convenience sample is like a random sample seeks to obtain the benefits without the costs—just on the basis of assumptions. If scrutinized, few convenience samples would pass muster as the equivalent of probability samples. Indeed, probability sampling is a technique whose use is justified because it is so unlikely that social processes will generate representative samples. Decades of survey research have demonstrated that when a probability sample is desired, probability sampling must be done. Assumptions do not suffice. Hence, our first recommendation for research practice: whenever possible, use probability sampling.

If the data-generation mechanism is unexamined, statistical inference with convenience samples risks substantial error. Bias is to be expected and independence is problematic. When independence is lacking, the p-values produced by conventional formulas can be grossly misleading. In general, we think that reported p-values will be too small; in the social world, proximity seems to breed similarity. Thus, many research results are held to be statistically significant when they are the mere product of chance variation.

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come into the picture.

The assumption of imaginary ‘super populations’ is one of many dubious assumptions used in modern econometrics and statistical analyses to handle uncertainty. As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

And as if this wasn’t enough, one could — as we’ve seen — also seriously wonder what kind of ‘populations’ these statistical and econometric models ultimately are based on. Why should we as social scientists — and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems — unquestioningly accept models based on concepts like the ‘infinite super populations’ used in e.g. the ‘potential outcome’ framework that has become so popular lately in social sciences?

One could, of course, treat observational or experimental data as random samples from real populations. I have no problem with that (although it has to be noted that most ‘natural experiments’ are *not* based on random sampling from some underlying population — which, of course, means that the effect-estimators, strictly seen, only are unbiased for the specific groups studied). But probabilistic econometrics does not content itself with that kind of populations. Instead, it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of ‘infinite super populations.’

But this is actually nothing else but hand-waving! And it is inadequate for real science. As David Freedman writes:

With this approach, the investigator does not explicitly define a population that could in principle be studied, with unlimited resources of time and money. The investigator merely

assumesthat such a population exists in some ill-defined sense. And there is a further assumption, that the data set being analyzed can be treatedas ifit were based on a random sample from the assumed population.These are convenient fictions… Nevertheless, reliance on imaginary populations is widespread. Indeed regression models are commonly used to analyze convenience samples… The rhetoric of imaginary populations is seductive because it seems to free the investigator from the necessity of understanding how data were generated.

In social sciences — including economics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …

Reality only happens once, as far as we know. Does that mean the probabilities are meaningless? Lars asserts that assigning probabilities to events implies we think real-world outcomes are a random drawing from a stable distribution of events in some metaphysical meta-universe. Because that is a weird if not nonsensical assumption, Lars has claimed it means statistical analysis of economic phenomena is not much (or no?) use.

In my conception, however, that is a complete misunderstanding of what is going on when we conduct econometrics. Errors or residuals (I use the terms interchangeably) are not a property of reality. Reality may be stochastic or entirely deterministic for all we know. (Philosophers have argued over that for millennia in the context of free will). The residuals are a property of the model in its relation to reality. We look at an open, evolving system and make a hypothesis about what has caused certain phenomena over a certain time period. We would not expect our hypothesis to fit each data observation perfectly. The system has apparently stochastic elements for two reasons: the agglomeration of idiosyncratic effects unaccounted for at the level of aggregation we are considering and the effect of variations in variables that can generally be ignored as unimportant but an occasional or extreme movement in which will disturb the system. Note we cannot generate a causal hypothesis from the data. The causal hypothesis is prior but if we are to continue to believe it, it must be compatible with the data. To determine that compatibility we require statistics.

We are not looking at reality as a sample of a larger population; we are comparing the implications of a causal hypothesis with data and seeing how well they fit. We are not using the central limit theorem to reassure ourselves that the sample (reality) is representative of a population of realities (!?) but in the knowledge that it means our errors will be normal if they are owing to a host of unrelated, idiosyncratic elements and no important systematic influence has been missed. A good fit proves nothing but if the model does not account for a substantial proportion of the variables of interest it is not a useful model.

If we knew our model was correct as far as it goes, the errors or residuals in its fit would come from the two sources cited. But there is always the risk that we have got it wrong and the model is mis-specified even with respect to the variables it covers. Some insight into that possibility is afforded by the nature of the errors themselves. If systematic relevant factors have been omitted, if the functional form of our equations is wrong, there will be tell-tale signature patterns in the residuals. If the residuals look like white noise and pass tests for being such there is a better chance the model is an adequate representation of reality for the purposes for which it was designed. Note that there is still no guarantee. Moreover, applying the model outside the sample for which it was fitted and tested means we are assuming that reality has been sufficiently stable for the model to remain useful. That is not something that has been proven in any sense. It is a hypothesis we retain faute de mieux until it is rejected by events. As Popper observed, any scientific generalisation remains permanently conjectural, corroborated by successes but always open to the possibility of refutation in other circumstances.

When we quote probabilities around future events we are saying the following. We think we understand this system to a useful extent; we have a representation of it which accounts for x per cent of the variations we see. If (note the necessary condition) our model is indeed – and continues to be – a serviceable picture of reality, these variables over the coming period will fall in the following ranges – probabilities of different outcomes can then be quoted. Those outcomes also require that extraneous forces behave as they have in our sample period, i.e. no volcanic eruptions and no asteroid strikes. Forecasts are necessarily conditional on that assumption. That’s what the probabilities mean whether we are talking of economics or weather forecasting. They will not survive unexpected exogenous shocks. Probabilities express degrees of confidence in different outcomes on the retained hypothesis that the model is roughly correct. The warrant for its correctness is (a) that it makes some sort of intuitive sense and (b) it fits the historical sample on which it was tested in terms of the size and characteristics of the observed errors in fit. As John Kay has remarked, the real probability of any event is the probability that our best model assigns to it times the probability that our model is correct. Especially out of sample there is no way at all to measure the latter quantity. If that is what Lars’ assertions come down to, we can agree. It is an important warning but does not invalidate the citation of probabilities, properly understood.

Lars and I are certainly in agreement that much contemporary economic theorising and modelling fails completely on criterion (a). It defies common sense and common observation. But it also usually fails on (b) as well to a greater or lesser extent.

The nature of reality imposes limits on what we can know for sure but we must do the best we can. Conjecture, hypothesis and the careful examination of data to test the hypothesis is the best we can do. The biggest fault in contemporary economics is a refusal to jettison pretty models that repeatedly fail empirical tests. By denigrating statistical testing, Lars, no doubt unwittingly, serves the forces of prejudice.

Gerald, after lamenting how mainstream economics allows “one … to start from assumptions that are flagrantly unrealistic” and how ME flaunts experimental results (e.g., humans are not always rational utility calculation machines aka “rationality,” the “money illusion,” and “aggregation issues” used to “construct representative agents”) he goes on to profess ot being all about empirical testing:

Further:

Lars post is about “convenience samples” based upon “imaginary populations.” Hardly something one would call “empirical testing.” What is weird is how Gerald, right out the gate, elides the core context and meaning of Lars post above and chooses to misrepresent the context and content of the post. Lars must have hit a hot button (perhaps Gerald likes using “convenience samples” and “imaginary populations”?) given the length of his verborse relply with irrelevant rhetoric never once addressing the question Lars addresses regarding “convenience samples” and “imaginary populations” and the dubious fictionarl nature of their substitution for the hard work of statistical sampling based upon empirical data.

What is weird AND nonsensical is the way Gerald elides the context and content of Lars post to engage in a long irrelevant rant that never once address the question of “convenience samples” and “imaginary populations” and if these are valid assumptions in light of trying to make econometrics more empirical.

I don’t think Meta understands what I am saying. But then it is mutual. I don’t understand him either. When I do time-series econometrics I use data that is not a sample of anything. If I claim to have found an explanation for that data it applies to the data not to anything else. Lars seems to think that I am treating the data as a sample of something else and claiming to have found a universal law or relationship but that is not so. I may use my finding to forecast but that is to make an auxiliary assumption – that I do not claim to have tested – namely that the system is sufficiently stable that the data-generating process will remain similar out of sample. If someone can come up with a better hypothesis it ought to be considered and used.

Incidentally I deplore Meta’s habit of injecting unnecessary heat into discussions. I was attempting to elucidate a difference in the way Lars appears to view econometric testing from my conception of it. Far from “eliding” a discussion of imaginary populations I am denying the existence of imaginary populations and explaining why I think so. I hope I did so politely and there is no reason to describe my note as a “rant”.

.

The point of Lars post is that for economics to be a real science it must draw its experimental data as random samples from

real populations(I learned this in statistics 101), not some investigator’s assumptions about imaginary “populations of ‘parallel universes’ and assume that our data are random samples from that kind of ‘infinite super populations.” If that is not “out of sample”, which is exactly what imaginary population samples are, and which by the way, Gerald feigns agreement with, then nothing is “out of sample” and statistics anything anyone wants it to be. To merely assume some imaginary population exists in some “ill-defined” sense is pseudoscience. Gerald, is pedantic rant elides this simple fact (which he agrees with, sort of, only to later engage in innuendo trying to assert that Lars intentions (which is hardly something Gerald can know, after all) that Lars is “denigrating [all] statistical testing, Lars, no doubt unwittingly, serves the forces of prejudice.” The only prejudice I see here comes from Gerald’s sophistry and attempt to read into Lars intentions rather than simply address the topic of the post (which he out of one side of his mouth agrees with).I understand perfectly well your style of rhetoric Gerald. Elide the core topic of Lars post, insinuate meanings (aka put words in his mouth he didn’t say) and then rant on with a irrelevant statistics lecture that conflates, confuses, and misdirects away from what you already agreed regarding imaginary “out of sample” samples form imaginary populations.

BTW, I could play back (but I won’t, unless I need to) a boat load of quotes from you asserting that to make economics more “empirical” it needs to be more “experimental” and that it is all in the empirical data. Now you are conflating imaginary populations with empirical data, eh? Consistency counts.