If you only have time to read one statistics book — this is the one!
from Lars Syll
Mathematical statistician David A. Freedman‘s Statistical Models and Causal Inference (Cambridge University Press, 2010) is a marvellous book. It ought to be mandatory reading for every serious social scientist – including economists and econometricians – who doesn’t want to succumb to ad hoc assumptions and unsupported statistical conclusions!
How do we calibrate the uncertainty introduced by data collection? Nowadays, this question has become quite salient, and it is routinely answered using wellknown methods of statistical inference, with standard errors, t -tests, and P-values … These conventional answers, however, turn out to depend critically on certain rather restrictive assumptions, for instance, random sampling …
Thus, investigators who use conventional statistical technique turn out to be making, explicitly or implicitly, quite restrictive behavioral assumptions about their data collection process … More typically, perhaps, the data in hand are simply the data most readily available …
The moment that conventional statistical inferences are made from convenience samples, substantive assumptions are made about how the social world operates … When applied to convenience samples, the random sampling assumption is not a mere technicality or a minor revision on the periphery; the assumption becomes an integral part of the theory …
In particular, regression and its elaborations … are now standard tools of the trade. Although rarely discussed, statistical assumptions have major impacts on analytic results obtained by such methods.
Consider the usual textbook exposition of least squares regression. We have n observational units, indexed by i = 1, . . . , n. There is a response variable yi , conceptualized as μi + i , where μi is the theoretical mean of yi while the disturbances or errors i represent the impact of random variation (sometimes of omitted variables). The errors are assumed to be drawn independently from a common (gaussian) distribution with mean 0 and finite variance. Generally, the error distribution is not empirically identifiable outside the model; so it cannot be studied directly—even in principle—without the model. The error distribution is an imaginary population and the errors i are treated as if they were a random sample from this imaginary population—a research strategy whose frailty was discussed earlier.
Usually, explanatory variables are introduced and μi is hypothesized to be a linear combination of such variables. The assumptions about the μi and i are seldom justified or even made explicit—although minor correlations in the i can create major bias in estimated standard errors for coefficients …
Why do μi and i behave as assumed? To answer this question, investigators would have to consider, much more closely than is commonly done, the connection between social processes and statistical assumptions …
We have tried to demonstrate that statistical inference with convenience samples is a risky business. While there are better and worse ways to proceed with the data at hand, real progress depends on deeper understanding of the data-generation mechanism. In practice, statistical issues and substantive issues overlap. No amount of statistical maneuvering will get very far without some understanding of how the data were produced.
More generally, we are highly suspicious of efforts to develop empirical generalizations from any single dataset. Rather than ask what would happen in principle if the study were repeated, it makes sense to actually repeat the study. Indeed, it is probably impossible to predict the changes attendant on replication without doing replications. Similarly, it may be impossible to predict changes resulting from interventions without actually intervening.
How do we calibrate the uncertainty introduced by data collection? Nowadays, this question has become quite salient, and it is routinely answered using wellknown methods of statistical inference, with standard errors, t -tests, and P-values … These conventional answers, however, turn out to depend critically on certain rather restrictive assumptions, for instance, random sampling …































Ordinarily, I’d be turned off by a book with a title such as this, but Freedman wrote the text used for my undergraduate statistics course in political science. A great book for social scientists. I’ll definitely try to get my hands on a copy of this one.
I agree with Freedman’s comments, having read many econometrics papers.
From work experience, seeing how colleagues in university, business and government grab data, put in spreadsheets, pass them to statistical packages, read the output analysis and then write their research papers, I can only conclude that they have little clue to what they are doing. Does the much cited recent paper by Reinhart and Rogoff ring a bell?
To begin with, there is generally no discussion or awareness of where the data come from, implying the ergodic assumption is universally applicable. Econometric studies generally ignore what Keynes said to Tinbergen back in 1939 (The Economic Journal, Vol 49, p.566):
Put broadly, the most important condition is that the environment in all relevant respects, other than the fluctuations in those factors of which we take particular account, should be uniform and homogeneous over a period of time.
Also there is never any verification that the model errors are normally distributed. If they did, then the statistical inferences based t-values and p-values are usually invalid, because the models are typically mis-specified. Of course, they could not then write all those papers to waste others time. Econometrics as it is practised today is a gigantic induction fallacy.
I don’t want to put too fine a point on this, but what is the nonsense Freedman describes. I’ve been working with statistical analysis for nearly 40 years. I’ve always assumed that everything Freedman describes as an area of concern in this book and in several other publications was a standard part of how social scientists use statistical techniques. During the 1970s issues such as the misinterpretation and misuse of the random sampling assumption, substituting statistical techniques for field work and reasonable research design, and elaborating models on nothing more than the application of statistical techniques were discussed and argued about. And I though settled in just the way described by Freedman. If Freedman is correct social scientists have once again run off the rails in their effort to convince the world and themselves that what they do is science. And they have once again chosen the path of mystifying the work they do as the means to accomplish that end. The only solace I can see here for social scientists in general is that economists mess it up even worse than their social scientist brothers and sisters.
So where did these dubious statistical methods of social “science” originate? As I keep saying, with a non-scientist, David Hume, in 1740. And why, Ken, do they keep reappearing? Because even if working scientists do find ways round them, teachers who haven’t train up a couple of generations of incautious replacements for when they’re gone. This is why curriculum reform is so important, if Freedman’s book is to do its job.
My contribution is from a different field, but the lessons to be learned from it are perhaps constructive. In the 1960’s space travel was catching on. A big issue was whether spacecraft would be reliable enough to last out longer missions, and the military had an interest in whether assessments, based on the mean time between failure (MTBF) of components, could be useful in terrestrial conditions. In 1967-9 I was involved in an experiment attempting to evaluate such equipment MTBF predictions against reports of actual failures in the field. That was a failure, but the experiment was worth doing anyway, because it showed up (1) unexpected runs of data, these usually pointing to manufacturing or even design problems rather than faulty MTBF data for the components (2) that most of the problems were due to the myriad of connections in the old-fashioned “wired” methods of construction. From the latter evolved first of all printed circuits and eventually integrated circuits. The reliability of the components had become so good that there was a statistical problem with small sample sizes, but again this it didn’t matter, because the MTBF figures had be estimated by genuine Bayesian methods (as in comparing the results of die throws with the physical symmetry of the dice), whereby component reliability improved from generation to generation by elimination of physical failure modes, enabling upwards interpolation of the estimates.
The economic equivalent of this seems to be the different types of institutions being the components of the system and people being the interconnections. Reliability of the system has been improved by specialisation within institutions, but at the cost of increased unreliability of the human interconnections as specialists tire or become unable unable to understand and learn from each other, hence the increasing movement to automation. Again, this is pointing towards the need for reform of educational curricula, to pass on the much more general scientific understanding now available, of language, logic, communication systems and time-sharing human control of control system servos.