Home > Uncategorized > The biggest problem in science

## The biggest problem in science

from Lars Syll

In 2016, Vox sent out a survey to more than 200 scientists, asking, “If you could change one thing about how science works today, what would it be and why?” One of the clear themes in the responses: The institutions of science need to get better at rewarding failure.

One young scientist told us, “I feel torn between asking questions that I know will lead to statistical significance and asking questions that matter.”

The biggest problem in science isn’t statistical significance. It’s the culture. She felt torn because young scientists need publications to get jobs. Under the status quo, in order to get publications, you need statistically significant results. Statistical significance alone didn’t lead to the replication crisis. The institutions of science incentivized the behaviors that allowed it to fester.

Brian Resnick

As shown over and over again when significance tests are applied, people have a tendency to read ‘not disconfirmed’ as ‘probably confirmed.’ Standard scientific methodology tells us that when there is only say a 5 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more ‘reasonable’ to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 5 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

We should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean nothing if the model is wrong. And most importantly — statistical significance tests DO NOT validate models!

In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p – 1 degrees of freedom in the numerator and n – p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.

If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this: if the model is right and the coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:
i) An unlikely event occurred.
ii) Or the model is right and some of the coefficients differ from 0.
iii) Or the model is wrong.
So?

1. April 17, 2021 at 5:32 pm

Confidence levels for the rejection of a hypothesis are always arbitrary. The question is where do you want to put the burden of proof? Do I have reason to believe this hypothesis and wish to retain it if possible? Or do I think it is pernicious and I’m out to nail it? That’s a prior decision and one should be open about it.
If I want to retain the model in Lars’ example above, I can say the F test permits me to do so. Option (iii) is always a possibility but the data don’t force me to that conclusion. As for (i) if you go through life wondering if everything you observe is a fluke you will end up in an asylum suffering from terminal indecision. Either that or you will become a celebrated philosopher of universal scepticism.
Of course the result does not compel me to believe the hypothesis if I don’t want to. In that case, the fluke theory is quite appealing. Note there is no reason to believe that applying the hypothesis to different data sets should reduce error margin. If the hypothesis is less than a complete theory of a social situation (which in practice it is sure to be) it will always generate errors when applied. In fact the more it is applied to different situations and the more often it is not rejected, the less attractive the fluke alternative becomes. While repeated independent tests never prove a hypothesis, we do get more confident about it and regard it as increasingly corroborated.

1. No trackbacks yet.

This site uses Akismet to reduce spam. Learn how your comment data is processed.