## What — if anything — do p-values test?

from **Lars Syll**

Unless enforced by study design and execution, statistical assumptions usually have no external justification; they may even be quite implausible. As result, one often sees attempts to justify specific assumptions with statistical tests, in the belief that a high p-value or ‘‘nonsignificance’’ licenses the assumption and a low p-value refutes it. Such focused testing is based on a meta-assumption that every other assumption used to derive the p-value is correct, which is a poor judgment when some of those other assumptions are uncertain. In that case (and in general) one should recognize that the p-value is simultaneously testing all the assumptions used to compute it – in particular, a null p-value actually tests the entire model, not just the stated hypothesis or assumption it is presumed to test.

All science entails human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Statistics is no substitute for thinking. We should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean next to nothing if the model is wrong. Statistical significance tests *do not* validate models!

In my understanding, a theory lies at the backside of a model.

Lars Syll>> Our p-values mean next to nothing if the model is wrong. Statistical significance tests do not validate models!

Then, we should talk about models and about theories which lie behind them. Please start to argue how mainstream economics (not econometrics) is wrong and how to rebuild economic theory that can replace the mainstream economics. As I have stated in other post, if we do not get an alternative theory, we remain trapped by the old theory and cannot escape from thinking within the realm of new classical economics thinking.

Here is a model, invented by the French cliometricians, to prove that France’s entrepreneurial performance in the 19th century was good. Assume that development of high tech industries shows entrepreneurial prowess. But wait a minute, the econometricians, cite the statistical growth of industries in old technologies to “prove” French technological performance. What a lot of crap, this science is. See, The End of the Practical Man, Introduction, “The Revisionists and their Theses.” 1984, Robert R. Locke.

Robert, your account is too short and difficult to know what you really want to say. In my impression, cliometric is much better than macro econometrics. Am I wrong?

According to the “best” self-help statistics manual ‘in the world,’ ‘Statistics for Dummies,’ “All hypothesis tests ultimately use a p-value to weigh the strength of the evidence (what the data are telling you about the population). The p-value is a number between 0 and 1 and interpreted in the following way:

• A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.

• A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.

• p-values very close to the cutoff (0.05) are considered marginal (could go either way). Always report the p-value so your readers can draw their own conclusions.”

Put non-mathematically, we want to answer the question, how similar is the sample we’ve drawn from a group to the entire group (population)? From within science, many believe there is a reproducibility crisis. What proportion of published research is likely to be false? Low sample size, small effect sizes, data dredging (also known as P-hacking), conflicts of interest, large numbers of scientists working competitively in silos without combining their efforts, and so on, may conspire to dramatically increase the probability that a published finding is incorrect. The field of science studies — the study of science itself — is flourishing and has generated substantial empirical evidence for the existence and prevalence of threats to efficiency in knowledge accumulation. Data from many fields suggests reproducibility is lower than is desirable; one analysis estimates that 85% of biomedical research efforts are wasted, while 90% of respondents to a recent survey in Nature agreed that there is a ‘reproducibility crisis.’

From the perspective of everyday life, current and historical, which is the study area of anthropology and history, much of social science research has become detached from that life. Sociologist Jack A. Douglas helped invent this perspective in the social sciences. He made clear this perspective (sometimes called existenz perspective) is mostly driven by the concrete experience of embodiment in everyday life and not theoretical doctrine. Allowing it to incorporate a wide range of intellectual ingredients into its ever-evolving form. These ingredients have included history, economics, biomedical science, theoretical physics, clinical medicine, Christian and Eastern mysticism, and fictional literature among many others. One example of statistics’ failure is the study of suicide. Using field research and the method of participant observation, Douglas realized that official statistics and bureaucratic reports hide the genuine meanings that arise in the context of suicide as a real action. Mathematics also fails us in the study of deviance. In most sociology systems moral meanings that rule-breakers give to their actions are understood uncritically and rarely subject to reflection. Thus, morality is treated as something absolute, external, and unproblematic. In contrast to this understanding, the existential perspective focuses on the processes of meaning construction in specific situations. Finally, economists’ analysis of derivatives, credit default swaps and other new financial instruments mostly via statistical and other mathematical methods is so useless that it did not reveal their corrosive influence on banks and financial markets until it was too late. Just a bit of field research would have revealed what most effective financial journalists knew but could not prove.

.

Roi (2017, 6) has a nice little vignette on Option Pricing, mathematics, and the illusion behind the pseudo-precision. Dunbar (2011, 36-52) does a nice little historical summation of the so-called “market makers.”

.

When I read Chapter 7 in Paul Davidson’s

Who’s Afraid of John Maynard Keynes?and view the La-La-Land of textbook theory in light of reality, and then further observe the remonstrations above attempting to bend reality to textbook theoretical fantasy, I can only ask myself is it any wonder parasitic finance continues to cycle through one crisis after another? Reality trumps theory every time in ivory tower economics it seems.Rob, the markets don’t provide ‘liquidity.’ What there is of order (assuming this is stability and predictability) in financial transactions is provided by the history of relationships among traders, bankers, etc. And in these, trust is still a major factor. Markets are the cultural arrangements where people are given an opportunity to create such relationships. As to ‘market makers,’ that’s every participant in the ‘markets,’ to lesser or greater degrees. Market participants themselves don’t accept the notion that a market participant or participants controls or ‘makes’ a market. That implies a lack of opportunity to use that market to make money. As to the emotions that most often accompany markets, particularly, financial markets, these are fear and optimism. Usually, oscillating between the two depending on success or failure in market transactions. As to the forms of markets themselves, even within a capitalist structure, these are always emerging based on current and historical circumstances and the make-up of market participants. For all these reasons, the only way to ‘find out’ about markets is observing them ‘in creation and operation.’

.

Well aware, Ken, per Nelson’s

Economics as Religion. But thanks for the reminder ;-) Few average Americans are in the stock market, so this pretty much a game of craps fo the elite..

.

Stating the obvious. But thanks. NOVA did a nice documentary called “Mind Over Money” that shows how fear and irrational emotion drives much of the market.

.

.

Nice delusion when one is playing the market, but that isn’t true at all. History tells a different story when it comes to the Subprime Crisis. Market ‘participants’ and ‘market maker’ are different things both in theory and in reality. Just that reality is more a case of market creation and then manipulation for a one-sided win-win situation for the Goldman Sachs market maker ilk.

.

.

Yep, totally agree. And it is important to observe the way theory hides and/or obfuscates the reality revealed in observation (historical or otherwise).

Rob, I did say to lesser or greater degrees. Even the lowliest participant can change the direction of the market. In the credit default swaps crisis, don’t forget Lehman Brothers.

The global banking giant HSBC bought its own subprime lender, Household International, in 2003 for $14 billion. Investment banks Credit Suisse, Bear Stearns, Lehman Brothers, Merrill Lynch, Morgan Stanley, and Goldman Sachs all bought or founded nonbank subprime lenders to feed their securitization machines. (The Subprime Virus: Reckless Credit, Regulatory Failure, and Next Steps” by Kathleen C. Engel, Patricia A. McCoy – http://a.co/5EDpo30)

Indeed