## Statisticism — confusing statistics and research

from **Lars Syll**

Coupled with downright incompetence in statistics, we often find the syndrome that I have come to call

statisticism: the notion that computing is synonymous with doing research, the naïve faith that statistics is a complete or sufficient basis for scientific methodology, the superstition that statistical formulas exist for evaluating such things as the relative merits of different substantive theories or the “importance” of the causes of a “dependent variable”; and the delusion that decomposing the covariations of some arbitrary and haphazardly assembled collection of variables can somehow justify not only a “causal model” but also, praise a mark, a “measurement model.” There would be no point in deploring such caricatures of the scientific enterprise if there were a clearly identifiable sector of social science research wherein such fallacies were clearly recognized and emphatically out of bounds.

Statistical reasoning certainly seems paradoxical to most people.

Take for example the well-known Simpson’s paradox.

From a theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities unless you are — miraculously — able to keep constant *all* other factors that influence the probability of the outcome studied.

To understand causality we always have to relate it to a specific causal *structure*. Statistical correlations are *never* enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Statistical — and econometric — patterns should never be seen as anything else than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.

Friedrich August von Hayek (Nobel lecture: 11 Dec 1974) understood it perfectly:

“I still doubt whether their search for measurable magnitudes has made significant contributions to our theoretical understanding of economic phenomena – as distinct from their value as a description of particular situations.”

As one trained at the MS level in applied stat: the use of statistics acknowledges the presence of unknown, or known but not yet measured, additional causal variables, Statistics normally accepts COMPLEXITY in nature, at least at our human scale of measurement, whatever the ultimate findings will be at the sub-atomic or cosmic. Personally, as an economist focused upon multi-cultural behavior & cognitive/decision-making issues, that is the only way to proceed, I have found.

Agreed. Readers should read and recommend “Statistics, Theory and Practise” written decades ago. No longer available in print, but obtainable from various second hand sources, and extremely realistic, critical, and educative. Relevant then and now.

Mathematics has two major elements. First, equations. An equation is a statement of an equality containing one or more variables. Solving the equation consists of determining which values of the variables make the equality true. Variables are also called unknowns and the values of the unknowns which satisfy the equality are called solutions of the equation. There are two kinds of equations: identities and conditional equations. An identity is true for all values of the variable. A conditional equation is true for only particular values of the variable. Second, definitions of variables. Each variable in an equation must be defined before the equation is solved. First, to assess if the solutions make sense. Second, to assess the usefulness of the solutions for the equation. These definitions may be quantitative, text, or even other equations. Mathematics creates proofs only regarding equations. Mathematics cannot provide proof beyond this. Also, mathematics cannot prove causation. Recognizing that identifying causation beyond general entanglement is a cultural choice.

Two of Lars’ comments are important here. “To understand causality we always have to relate it to a specific causal structure. Statistical correlations are never enough. No structure, no causality.” Once culture establishes a need to look for something called, “causality,” then culture provides the structure within which that causality is to be found. Here we can note that only western cultures seem to believe in and create causation. All cultures seem to have some notion of things in the world being interconnected, being related in some ways. But only western cultures create a specific notion of a cause leading to an effect. This is one of the more interesting differences between Chinese and western science. If China comes to dominate the world not just economically but also culturally and scientifically, this could lead to world-wide changes much broader than now supposed by the prognosticators of such things.

Second statement. “Statistics cannot establish the truth value of a fact. Never has. Never will.” This is not correct. If truth is defined in terms that fit within the parameters of statistics, then statistical analysis can indeed establish truth. In western cultures this is not the most widely accepted understanding of truth, especially among scientists. In western cultures truth is usually related to some theory or philosophy (e.g., religious, naturalistic, mechanical). The history of why statistical thinking is not included in this list for most members of western cultures would make a revealing and useful PhD dissertation. Among the other 500 or so I’ve recorded over the last 40 years. So far, no one has chosen this one.