## Statistical models and the assumptions on which they build

from Lars Syll

Every method of statistical inference depends on a complex web of assumptions about how data were collected and analyzed, and how the analysis results were selected for presentation. The full set of assumptions is embodied in a statistical model that underpins the method … Many problems arise however because this statistical model often incorporates unrealistic or at best unjustified assumptions …

The difficulty of understanding and assessing underlying assumptions is exacerbated by the fact that the statistical model is usually presented in a highly compressed and abstract form—if presented at all. As a result, many assumptions go unremarked and are often unrecognized by users as well as consumers of statistics. Nonetheless, all statistical methods and interpretations are premised on the model assumptions; that is, on an assumption that the model provides a valid representation of the variation we would expect to see across data sets, faithfully reflecting the circumstances surrounding the study and phenomena
occurring within it.

Sander Greenland et al.

If anything, the common abuse of statistical tests underlines how important it is not to equate science with statistical calculation. All science entails human judgment, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of statistics is actually zero — even though you’re making valid statistical inferences! Statistical models are no substitutes for doing real science. Or as a famous German philosopher once famously wrote:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

We should never forget that the underlying parameters we use when performing statistical tests are model constructions. And if the model is wrong, the value of our calculations is nil. As ‘shoe-leather researcher’ David Freedman wrote in Statistical Models and Causal Inference:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

1. October 6, 2022 at 12:44 pm

“Much of the world’s financial markets are currently being dangerously overstretched through an exaggerated reliance on intrinsically weak financial models based on very short series of statistical evidence and very doubtful volatility assumptions”
http://subprimeregulations.blogspot.com/2004/10/my-statement-on-ibrds-liquidity.html

2. October 6, 2022 at 8:31 pm

Thank you Lars! This is such a fundamental problem with neoclassical economics (and throughout the social sciences). I’m afraid there is no way out, except abandonment, as Ed leamer suggested long ago. My take on this was published in RWER a few years back. http://www.paecon.net/PAEReview/issue74/Klees74.pdf
Steve Klees
University of Maryland