Home > economics profession > Mindless statistics and the null ritual

Mindless statistics and the null ritual

from Lars Syll

Knowing the contents of a toolbox, of course, requires statistical thinking, that is, the art of choosing a proper tool for a given problem. Instead, one single procedure that I call the “null ritual” tends to be featured in texts and practiced by researchers. Its essence can be summarized in a few lines:

The null ritual:
1. Set up a statistical null hypothesis of “no mean difference” or “zero correlation.” Don’t specify the predictions of your research hypothesis or of any alternative substantive hypotheses.
2. Use 5% as a convention for rejecting the null. If significant, accept your research hypothesis. Report the result as p < 0.05, p < 0.01, or p < 0.001 (whichever comes next to the obtained p-value).
3. Always perform this procedure …  

gig

The routine reliance on the null ritual discourages not only statistical thinking but also theoretical thinking. One does not need to specify one’s hypothesis, nor any challenging alternative hypothesis … The sole requirement is to reject a null that is identified with “chance.” Statistical theories such as Neyman–Pearson theory and Wald’s theory, in contrast, begin with two or more statistical hypotheses.

In the absence of theory, the temptation is to look first at the data and then see what is significant. The physicist Richard Feynman … has taken notice of this misuse of hypothesis testing. I summarize his argument:

Feynman’s conjecture:
To report a significant result and reject the null in favor of an alternative hypothesis is meaningless unless the alternative hypothesis has been stated before the data was obtained.

Feynman’s conjecture is again and again violated by routine significance testing, where one looks at the data to see what is significant. Statistical packages allow every difference, interaction, or correlation against chance to be tested. They automatically deliver ratings of “significance” in terms of stars, double stars, and triple stars, encouraging the bad afterthe-fact habit. The general problem Feynman addressed is known as overfitting … Fitting per se has the same
problems as story telling after the fact, which leads to a “hindsight bias.” The true test of a model is to fix its parameters on one sample, and to test it in a new sample. Then it turns out that predictions based on simple heuristics can be more accurate than routine multiple regressions … Less can be more. The routine use of linear multiple regression exemplifies another mindless use of statistics …

We know but often forget that the problem of inductive inference has no single solution. There is no uniformly most powerful test, that is, no method that is best for every problem. Statistical theory has provided us with a toolbox with effective instruments, which require judgment about when it is right to use them … Judgment is part of the art of statistics.

To stop the ritual, we also need more guts and nerves. We need some pounds of courage to cease playing along in this embarrassing game. This may cause friction with editors and colleagues, but it will in the end help them to enter the dawn of statistical thinking.

About these ads
Categories: economics profession
  1. January 25, 2013 at 10:46 pm

    Significant and well said, Lars, but I’m not with you on this:

    “The true test of a model is to fix its parameters on one sample, and to test it in a new sample”.

    The true test of a model is to fix its parameters one way and to test them another way. Thus the probability of a die throwing six is 1/6 on the basis of symmetry and the null hypothesis of no bias, while the probability of no bias is estimated from the distribution of a sample of actual throws. I’ve been saying for forty years that this is what Bayes’ theorem is all about – the real and the imaginary of complex probability, Shannon’s relative frequency of logically available choices and Chesterton’s concept of triangulation to fix otherwise indeterminate distances – not the one-dimensional interpretation of probability suggested above and by economists as eminent as Kenneth Arrow.

  2. Min
    January 28, 2013 at 5:22 am

    “To report a significant result and reject the null in favor of an alternative hypothesis is meaningless unless the alternative hypothesis has been stated before the data was obtained.”

    It does not matter whether the alternative hypothesis has been state or not. All that a test of the null hypothesis tests is the null hypothesis. It say nothing about any other particular hypothesis. It cannot, because it tests no other hypothesis.

    Testing the null hypothesis turns falsification on its head. The hypothesis of interest is not subjected to testing. To be sure, evidence against the null hypothesis is confirmatory of other hypotheses, but confirmatory evidence is extremely weak.

    Bayesianism is making a comeback, but there are well known problems with Bayesian priors. Which is why Fisher looked for a better way. However, there is no problem whatsoever with comparing two specific hypotheses in a Bayesian manner, to see which one the evidence lends more support to, and to what degree. You can do that with the null hypothesis and an alternative hypothesis.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 9,916 other followers