Home > Uncategorized > Why the p-value is a poor substitute for scientific​ reasoning

Why the p-value is a poor substitute for scientific​ reasoning

from Lars Syll

A non-trivial part of teaching statistics is made up of learning students to perform significance testing. A problem I have noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests really are, still most students misinterpret them.

This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent (conditional probability inference is difficult even to those of us who teach and practice it). A lot of researchers fall prey​ to the same mistakes. 

If anything, the above video underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero —  even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give ​the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Most importantly — we should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. Statistical​ significance tests DO NOT validate models!

statistical-models-theory-and-practice-2-e-original-imaeahk3hfzrmxz9In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p-1 degrees of freedom in the numerator and n-p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.

If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this: if the model is right and the coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:
i) An unlikely event occurred.
ii) Or the model is right and some of the coefficients differ from 0.
iii) Or the model is wrong.

  1. Helen Sakho
    May 24, 2018 at 9:00 pm

    One’s fondest memory of “P” in Micro/Macro Econometrics (theory and practice) has to be the incredibly “uneducated?!” assumption that the demand and supply relating to any good/service depends – first and foremost – on its Price, all other things being equal!

  2. May 25, 2018 at 12:13 am

    Right! This is a good warning on the over-dependence on statistical reasoning. It suggests us two things:

    (1) The necessity to rehabilitate theoretical reasoning and examination.

    (2) The importance ti distinguish (ad hoc) models and a theory.

  3. Rhonda Kovac
    May 26, 2018 at 1:44 am

    Excellent point. However, instead of being satisfied with merely vaguely encouraging ourselves to ‘be scientific’, we should be looking to develop specific procedures — protocols (‘meta-protocols’??) — for doing so,.procedures which have rigorous, measurable criteria and parameters.

    • May 26, 2018 at 5:08 am

      Reasonable request, but it would be a bit difficult request for a philosopher of economics like Lars Syll to anwer. It must be the task of theorists to propose such a research program and the the core theory. There must be various opinions and standpoints.

      From my point of view, new economics should be founded on the four research orientations:

      (1) Refuse equilibrium framework and try to construct theories on process analyses.
      (2) Abandon demand and supply price theory and change over to another line of price theory called classical theory of value.
      (3) Construct an evolutionary analyses of behaviors, institutions, techniques and products on the basis of the classical theory of value.
      (4) Do not try to construct at once a theory that comprises all aspects of economy. Try to find sure theories which describes specific aspects or domains of economic activities.

      If you are interested in any of these clauses, I am happy to explain more in detail.

      • May 26, 2018 at 11:59 am

        Reasonable responses from the perspective of economics, but from my point of view the root problem is in our dehumanised philosophy of science, which gives free reign to our self-serving philosophy of economics. From the perspective of what is now known about personality differences and how the architecture of the brain influences human perception (hence the insanity of only using the linguistic half of it, i.e. accepting assertions without the observational testing which motivates action), let me sympathetically reinterpret Yoshinori’s earlier propositions:

        “(1) The necessity to rehabilitate theoretical reasoning and examination”. Yes, with the understanding that theories are conditional and merely tell you where to look.

        “(2) The importance to distinguish (ad hoc) models and a theory”. Yes; theories are general, models specific.

        and thus his later ones:

        “(1) Refuse equilibrium framework and try to construct theories on process analyses”. Yes, because the equilibrium framework is a model devoid of motivation: at best a special case of PID process control theory, evolving to make possible its aims.

        “(2) Abandon demand and supply price theory and change over to another line of price theory called classical theory of value”. Yes and no. Add production and consumption to supply and demand and consider the order in which particulars occur in a process setting. The classical theory is mass production oriented. I would emphasise reproducing what has already been consumed: telling us to look at renewing the face of the earth rather than making a worthless monetary profit.

        “(3) Construct an evolutionary analyses of behaviors, institutions, techniques and products on the basis of the classical theory of value”. Yes; but no: not on the classical theory of value. The basis has to be what has already been found to be useful, not what some promoter claims will be.

        “(4) Do not try to construct at once a theory that comprises all aspects of economy. Try to find sure theories which describe specific aspects or domains of economic activities”.

        A big NO to this: it assumes theory is descriptive and therefore linguistic, whereas an activity or process is a program explained by its aims and the capabilities of its processor. Who needs a complete description, anyway, when one can look at the reality? What one does need is a theory of everything which enables one to see the purpose of economics, and this I find actually portrays the evolution of the PID control system: models of which provide instructive analogies to the economic system. The purpose of this is “feeding the kids”, and “micro” theories of how we go about doing that should direct our attention more to actual failures that need addressing (not least laws enforcing fraudulent rents and usury) than to apparent opportunities for short-term monetary profit in the as yet uncertain future.

      • May 26, 2018 at 6:16 pm

        Dear davetaylor1
        It would be difficult to reconcile your and my philosophy of science. What I can say is only this: yours seems more philosophical and even ethical, whereas my is more strictly on the tradition of sciences.

      • May 27, 2018 at 8:28 am

        Dear Yoshinori
        Nothing worth doing is easy. Talking about the philosophies of science rather than our views, it seems to be impossible to reconcile them, for “the” tradition of science you refer to has arisen from the unwise philosophical arguments of David Hume, which created an abstraction from real science by excluding the philosophical and ethical roots of Francis Bacon’s original concept of science:- of an encyclopedia of knowledge “for the glory of God and the relief of Man’s estate”. Hence the current emphasis on observation and the discussion here of “why-the-p-value-is-a-poor-substitute-for-scientific​-reasoning”. Tony Lawson is right: reorientation is needed.

        Our own views could be reconciled by your considering the history rather than the current practice of science and allowing it to change your mind. We have gone from “taking things to bits to see how they work” to basing important decisions on changes in graphs of observations, missing the significance of blood [and money] circulating in complex arterial systems that Bacon’s method revealed to his physician William Harvey.

      • Frank Salter
        May 27, 2018 at 2:48 pm

        On Dave Taylor May 27, 2018 at 8:28 am:

        Dave seems to be drawing an analogy with “ontogeny recapitulates phylogeny”. While I accept that current understanding is informed by knowledge of the historical development of most subject areas, I fail to understand much of his reasoning.

        The practice of chemistry originated in alchemy. It has now become a very different discipline. Mathematics is now a very important part of chemistry. Should one still consider alchemy?

        For tens of thousands of years, man made tools from flint. from bronze after c.3300 and from iron after c.700. There seems little point for modern engineers to know more than this. Iron is now made in blast furnaces or by direct reduction.

        The tradition of science, I would respectively suggest, is “If it works use it”. Economic theorising is redolent with the opposite thinking, which might be described as if someone deemed to be eminent said it then keep on saying it and keep on ignoring empirical falsification, though this is the scientific way. I look forward to any other explanations which might be forthcoming.

      • Frank Salter
        May 27, 2018 at 2:50 pm

        please add B.C after 3300 and 700.

      • Craig
        May 27, 2018 at 7:23 pm

        The philosophy of Science is increasingly morphing into the religiosity it reacted to 500 years ago. As I said here before “Science like food is beautiful, wonderful, necessary and delicious…and it resides entirely within the digestive tract of Wisdom.”

        I’ve seen this truth at work in the places I’ve posted to many times over the last 10 years in as disparate forums as the Austrian Mish Shedlock’s blog to Steve Keen’s Debtwatch when using the traditionally religious word grace in trying to communicate its philosophical significance to economic theory and its conceptual opposition to the current paradigm of Debt As In Burden Only. The resistance to any possibility of integrating the word into thinking was so strong you could almost hear minds clamp shut the moment they read the word. And this despite my repeatedly saying I was not using the term in a religious way. Never mind in the case of Shedlock’s blog that I was showing him how his deflationary dreams could become reality with policies aligned with the concept. Never mind in the case of the protesting posters on Keen’s blog that I was in 100% agreement with his critiques of neo-liberal economics as well as his disequilibrium and instability hypotheses and how policies based on the concept of grace would bring an end to DSGE and its dominance….I was accused of being a religious zealot.

        Science is a derivative form of the greater mental set that is Wisdom.

  4. Edward K Ross
    May 28, 2018 at 6:26 am

    While I find Lars Syll’s post and the following comments with there various comment very thought provoking, I want to comment on Dave Taylor’s first paragraph May 26’2018 at 11:15 AM.

    In particular “from my point of view the root problem is in our dehumanised philosophy of science, which gives free range to our self-serving philosophy of economics”.

    Certainly I may be a jack of all and master of none but the way I see it is if economics is about people then observing and learning how to understand peoples concerns and what effects their way of thinking should logically be the first step in economic conversation. Thus as Dave Taylor argues we need to use both sides of our brain, failure to take this notion into account leaves us in the position where we could be accused of being a half wit, regardless of our philosophical or scientific preferences.

    Then in Dave Taylors response to Yoshinori Shiozawa and his reference to ” the philosophical and ethical roots of Francis Bacon’s original concepts of science :-of an encylopedia of knowledge” for the glory of God and relief of man’s estate”. As well as his support of Tony Lawson’s call for the reorientation of economics. Finally what I would like to say about Dave Taylor is that from my perspective he is not afraid to express his concern for his fellow man based on his religious ethic. and concern for the disadvantaged. Furthermore he is not against science but balances science with thinking using both sides of his brain. Ted.

  5. May 28, 2018 at 11:55 am

    Frank, I have shared Craig’s experience of being different: “The resistance to any possibility of integrating the word [‘grace’] into thinking was so strong you could almost hear minds clamp shut the moment they read the word”. Craig, as a Christian well aware of the accounts of the judicial murder of Christ, I understand where you are coming from, though logically ‘grace’ cannot be spoken of as a ‘thing’. As I see it, it is a phase in a dynamic cycle of graceful giving, due gratitude and thanksgiving, spawning a new cycle by turning the mind of the grateful recipient to giving gracefully in turn. The greatest complement is copying.

    So, Frank, you fail to understand much of my reasoning, and I am not surprised, but again there is a dynamic in this. Understanding anything unfamiliar takes time and involves seeking a way of interpreting it that makes sense. It takes two to tango. But fair comment. I thought I had sent you a copy of the paper in which I tried to explain myself at some length, but looking back through my email, apparently I didn’t – presumably because you expressed commitment to your own views so forcefully that I anticipated rejection of mine. I’ll send it now.

    Let me here try to respond to your comments now. My position is that different people have different interests, but these develop in a specific order. Thus (to echo Craig again), childish self-centred emotions (consumption) will not necessarily mature via skilling (production) and caring (distribution) into wisdom (gratefully seeking improvement). “Ontogeny recapitulates phylogeny” is insightful but not quite right here. That would be Normal Science, as about the evolution of species and thought. The science I’ve been involved with has been Kuhn’s Revolutionary Science, the evolution of radically different forms instanced by the transition from chemistry to four basic forms of life and then from thinking humans to four basic forms of thought, each using three of the four parts of the human brain. To take the example of electronic engineering, the basic theory is not about how television sets work, it is about ohms law and inductance and capacitance and the variability of valves such as transistors. In the same way the basic theory of everything is not about living things realising their genetic programming; it is more about the possibility of scientific language accounting for everything, in the way the same algorithmic procedures apply in the arabic numbering system whether one is talking about 1’s or trillions. All that needs to be postulated initially is energetic motion and the orthogonally defined language of Cartesian coordinates. From this one can develop the language of the analog clock with four quarters, enabling one to express the observation of unconstrained motion becoming constrained first in one dimension, then two, then three, at which point the motion has been completely controlled and it has turned into a thing, i.e. something different from motion, yet with motion inside. At the other end of the argument the development of derivatives has turned monetarily controlled economics into money making.

    Turning then to the historic transition from alchemy to chemistry, one could look back further, to the cooking of food. That may have originated accidentally, e.g. from eating food burned in forest fires. Nevertheless, recall what I wrote above: “David Hume … created an abstraction from real science by excluding the philosophical and ethical roots of Francis Bacon’s original concept”. The philosophical choice was between trying out and avoiding such food, and the ethical choice was between burning forests to provide it or using constrained camp fires to cook. Finding gold delighted their women and was convenient for bulk trade, the alchemists tried to produce it (as they had done copper and iron) by cooking, but given subsequent analysis by molecular weights, electrolysis and atomic structure, that turned out not to be feasible. So what? The basic motivation or purpose of cooking, alchemy and chemistry was the same: to transform the gifts of nature into more humanly satisfactory forms. Only when one is taking the logic of this for granted does your quantative mathematics have any point.

    Agreed engineers as such need understand only modern production methods. My point is that we are not all engineers, and engineering is merely instrumental to larger purposes, as a bridge is to crossing a river which could be forded, swum, ferried or tunnelled. When I was introduced to philosophy of science I found that my fellow would-be engineers were satisfied with their engineering and unconcerned by the uses to which it could be put.

    On “the” tradition of science, I agree that ““If it works use it” is the prevailing tradition, but it is the Humean one, the amoral positivism which has been empirically refuted; Normal (applied) rather than Revolutionary (basic) science, more akin to engineering fault-finding than discovery and evaluation of unconsidered options. So we know how to make nuclear bombs. Should we use that knowledge to make and use nuclear bombs? Engineers and political bomb-consumers have done so. Having seen the results, wiser men would not do so again.

    Your characterisation of the current state of economics is pretty well spot on. The economy isn’t working as an economy, and its working as a money-making machine is what is wrecking it. Not just fault-finding but elimination of the centralised money-making machine is what we need to be about. Hence Craig’s advocacy of giving rather than renting out money; mine of the credit card relationships as a model of what money really is and truly does. This is not about detailed science; it is more about the perception of motion being ambiguous and the negative of a photo, despite appearances, containing the same information as its print. The difference beteen the incoherence of a waveform and its coherence in a spectrum analysis suggests to me the possibility of a new paradigm for economics in which the perceived direction of credit has been inverted, while time inverted has become frequency or timescales of circulation, this focussing attention (as in information processors) on the timesharing and synchronisation of diverse processes, including storage as well as transmission.

    • May 28, 2018 at 12:54 pm

      A small but significant omission in this from the first paragraph above:

      “As I see it, it is a phase in a dynamic cycle of graceful giving, due gratitude and thanksgiving, spawning a new cycle by turning the mind of the grateful recipient to giving gracefully in turn”.

      Before “spawning” insert “the thanks motivating more giving and” spawning a new cycle etc.

    • Frank Salter
      May 29, 2018 at 9:05 am

      I applaud your intentions. However, pious hope will not change reality. There has to be significant understanding of what is valid to justify any significant change! There are many papers showing the correlation between reduced union power, diminished state provisions etc and poor economic performance. But they have not reversed the trends. Only greater proof of what can be achieved by different policies will any real action come about. I believe that a real theoretical understanding of the economy, which disproves mainstream analysis, is necessary to justify the policies which are required to be implemented.

  6. Helen Sakho
    May 30, 2018 at 2:41 am

    So, back to the beginning of this no doubt wonderful dialogue. I would only double my Price, if I may into PxP=Problematise the Price. Should anyone require a third or even fourth “P”, we can simply add and Politicise and Publicise it.

  7. June 4, 2018 at 4:35 am

    Our culture tells us to believe that each event is separable and identifiable and has a cause or causes that can not only be identified, but in many instances converted to mathematical expressions. This culture also tells us that science is the most rigorous and precise (mathematically speaking) means for describing and understanding causation. Per this culture, science is the decisive means of knowing, of acquiring knowledge. Science must be performed scientifically. Currently, that performance consists of verifying (via mathematics if possible) observational guesses about relationships in the world around us. With these results tested through further observation. In this way networks of scientific knowledge are created. Statistics is one of the tools that’s been chosen to pursue the scientific process. The use and abuse of “P” value is part of the normal and usual process of evolving effective uses for statistics. Comparing observations is the name of the game in science. Is one event like another? How to decide? Statistics has been used to answer these questions. Historical comparisons have also been used. As have various forms of experimentation. Which is better is difficult to say, but generally depends on the questions the scientist wants to answer. There are many issues associated with statistical comparisons. Among there are, the number and forms of probability assumed to exist, the relationship of computed probability under varying initial conditions, and the applicability of the probability form assumed in standard probability theory to that in the observations under study. While some scientists may be aware of such problems in statistical practice, few scientists take time to assess them in performing the comparisons on which the scientist focuses. In some instances, such assessment isn’t possible. But scientists should be aware of the need for it in all their statistical work.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.