## Mindless statistics and the null ritual

from **Lars Syll**

Knowing the contents of a toolbox, of course, requires statistical thinking, that is, the art of choosing a proper tool for a given problem. Instead, one single procedure that I call the “null ritual” tends to be featured in texts and practiced by researchers. Its essence can be summarized in a few lines:

The null ritual:

1. Set up a statistical null hypothesis of “no mean difference” or “zero correlation.” Don’t specify the predictions of your research hypothesis or of any alternative substantive hypotheses.

2. Use 5% as a convention for rejecting the null. If signiﬁcant, accept your research hypothesis. Report the result as p < 0.05, p < 0.01, or p < 0.001 (whichever comes next to the obtained p-value).

3. Always perform this procedure …The routine reliance on the null ritual discourages not only statistical thinking but also theoretical thinking. One does not need to specify one’s hypothesis, nor any challenging alternative hypothesis … The sole requirement is to reject a null that is identiﬁed with “chance.” Statistical theories such as Neyman–Pearson theory and Wald’s theory, in contrast, begin with two or more statistical hypotheses.

In the absence of theory, the temptation is to look ﬁrst at the data and then see what is signiﬁcant. The physicist Richard Feynman … has taken notice of this misuse of hypothesis testing. I summarize his argument:

Feynman’s conjecture:

To report a signiﬁcant result and reject the null in favor of an alternative hypothesis is meaningless unless the alternative hypothesis has been stated before the data was obtained.Feynman’s conjecture is again and again violated by routine signiﬁcance testing, where one looks at the data to see what is signiﬁcant. Statistical packages allow every difference, interaction, or correlation against chance to be tested. They automatically deliver ratings of “signiﬁcance” in terms of stars, double stars, and triple stars, encouraging the bad afterthe-fact habit. The general problem Feynman addressed is known as overﬁtting … Fitting per se has the same

problems as story telling after the fact, which leads to a “hindsight bias.” The true test of a model is to ﬁx its parameters on one sample, and to test it in a new sample. Then it turns out that predictions based on simple heuristics can be more accurate than routine multiple regressions … Less can be more. The routine use of linear multiple regression exempliﬁes another mindless use of statistics …We know but often forget that the problem of inductive inference has no single solution. There is no uniformly most powerful test, that is, no method that is best for every problem. Statistical theory has provided us with a toolbox with effective instruments, which require judgment about when it is right to use them … Judgment is part of the art of statistics.

To stop the ritual, we also need more guts and nerves. We need some pounds of courage to cease playing along in this embarrassing game. This may cause friction with editors and colleagues, but it will in the end help them to enter the dawn of statistical thinking.

Significant and well said, Lars, but I’m not with you on this:

“The true test of a model is to ﬁx its parameters on one sample, and to test it in a new sample”.

The true test of a model is to fix its parameters one way and to test them another way. Thus the probability of a die throwing six is 1/6 on the basis of symmetry and the null hypothesis of no bias, while the probability of no bias is estimated from the distribution of a sample of actual throws. I’ve been saying for forty years that this is what Bayes’ theorem is all about – the real and the imaginary of complex probability, Shannon’s relative frequency of logically available choices and Chesterton’s concept of triangulation to fix otherwise indeterminate distances – not the one-dimensional interpretation of probability suggested above and by economists as eminent as Kenneth Arrow.

“To report a signiﬁcant result and reject the null in favor of an alternative hypothesis is meaningless unless the alternative hypothesis has been stated before the data was obtained.”

It does not matter whether the alternative hypothesis has been state or not. All that a test of the null hypothesis tests is the null hypothesis. It say nothing about any other particular hypothesis. It cannot, because it tests no other hypothesis.

Testing the null hypothesis turns falsification on its head. The hypothesis of interest is not subjected to testing. To be sure, evidence against the null hypothesis is confirmatory of other hypotheses, but confirmatory evidence is extremely weak.

Bayesianism is making a comeback, but there are well known problems with Bayesian priors. Which is why Fisher looked for a better way. However, there is no problem whatsoever with comparing two specific hypotheses in a Bayesian manner, to see which one the evidence lends more support to, and to what degree. You can do that with the null hypothesis and an alternative hypothesis.