## Bayesianism — preposterous mumbo jumbo

from **Lars Syll**

Neoclassical economics nowadays usually assumes that agents that have to make choices under conditions of uncertainty behave according to Bayesian rules (preferably the ones axiomatized by **Ramsey** (1931), **de Finetti** (1937) or **Savage** (1954)) – that is, they maximize expected utility with respect to some subjective probability measure that is continually updated according to Bayes theorem. If not, they are supposed to be irrational, and ultimately – via some “Dutch book” or “money pump” argument – susceptible to being ruined by some clever “bookie”.

Bayesianism reduces questions of rationality to questions of internal consistency (coherence) of beliefs, but – even granted this questionable reductionism – do rational agents really have to be Bayesian? As I have been arguing elsewhere (e. g. here and here) there is no strong warrant for believing so.

In many of the situations that are relevant to economics one could argue that there is simply not enough of adequate and relevant information to ground beliefs of a probabilistic kind, and that in those situations it is not really possible, in any relevant way, to represent an individual’s beliefs in a single probability measure.

The view that Bayesian decision theory is only genuinely valid in a

small worldwas asserted very firmly by Leonard Savage when laying down the principles of the theory in his path-breakingFoundations of Statistics. He makes the distinction between small and large worlds in a folksy way by quoting the proverbs ”Look before you leap” and ”Cross that bridge when you come to it”. You are in a small world if it is feasible always to look before you leap. You are in a large world if there are some bridges that you cannot cross before you come to them.As Savage comments, when proverbs conflict, it is proverbially true that there is some truth in both—that they apply in different contexts. He then argues that some decision situations are best modeled in terms of a small world, but others are not. He explicitly rejects the idea that all worlds can be treated as small as both ”ridiculous” and ”preposterous” … Frank Knight draws a similar distinction between making decision under risk or uncertainty …

Bayesianism is understood [here] to be the philosophical principle that Bayesian methods are always appropriate in all decision problems, regardless of whether the relevant set of states in the relevant world is large or small. For example, the world in which financial economics is set is obviously large in Savage’s sense, but the suggestion that there might be something questionable about the standard use of Bayesian updating in financial models is commonly greeted with incredulity or laughter.

Someone who acts as if Bayesianism were correct will be said to be a Bayesianite. It is important to distinguish a Bayesian like myself—someone convinced by Savage’s arguments that Bayesian decision theory makes sense in small worlds—from a Bayesianite. In particular, a Bayesian need not join the more extreme Bayesianites in proceeding as though:

• All worlds are small.

• Rationality endows agents with prior probabilities.

• Rational learning consists simply in using Bayes’ rule to convert a set of prior

probabilities into posterior probabilities after registering some new data.Bayesianites are often understandably reluctant to make an explicit commitment to these principles when they are stated so baldly, because it then becomes evi-dent that they are implicitly claiming that David Hume was wrong to argue that the principle of scientific induction cannot be justified by rational argument …

Bayesianites believe that the subjective probabilities of Bayesian decision theory can be reinterpreted as logical probabilities without any hassle. Its adherents therefore hold that Bayes’ rule is the solution to the problem of scientific induction. No support for such a view is to be found in Savage’s theory—nor in the earlier theories of Ramsey, de Finetti, or von Neumann and Morgenstern. Savage’s theory is entirely and exclusively a consistency theory. It says nothing about how decision-makers come to have the beliefs ascribed to them; it asserts only that, if the decisions taken are consistent (in a sense made precise by a list of axioms), then they act as though maximizing expected utility relative to a subjective probability distribution …

A reasonable decision-maker will presumably wish to avoid inconsistencies. A Bayesianite therefore assumes that it is enough to assign prior beliefs to as decisionmaker, and then forget the problem of where beliefs come from. Consistency then forces any new data that may appear to be incorporated into the system via Bayesian updating. That is, a posterior distribution is obtained from the prior distribution using Bayes’ rule.

The naiveté of this approach doesn’t consist in using Bayes’ rule, whose validity as a piece of algebra isn’t in question. It lies in supposing that the problem of where the priors came from can be quietly shelved.

Savage did argue that his descriptive theory of rational decision-making could be of practical assistance in helping decision-makers form their beliefs, but he didn’t argue that the decision-maker’s problem was simply that of selecting a prior from a limited stock of standard distributions with little or nothing in the way of soulsearching. His position was rather that one comes to a decision problem with a whole set of subjective beliefs derived from one’s previous experience that may or may not be consistent …

But why should we wish to adjust our gut-feelings using Savage’s methodology? In particular, why should a rational decision-maker wish to be consistent? After all, scientists aren’t consistent, on the grounds that it isn’t clever to be consistently wrong. When surprised by data that shows current theories to be in error, they seek new theories that are inconsistent with the old theories. Consistency, from this point of view, is only a virtue if the possibility of being surprised can somehow be eliminated. This is the reason for distinguishing between large and small worlds. Only in the latter is consistency an unqualified virtue.

Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in the US is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1, if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed.

That feels intuitively wrong though, and I guess most people would agree. Bayesianism cannot distinguish between symmetry-based probabilities from information and symmetry-based probabilities from an absence of information. In these kinds of situations most of us would rather say that it is simply irrational to be a Bayesian and better instead to admit that we “simply do not know” or that we feel ambiguous and undecided. Arbitrary an ungrounded probability claims are more irrational than being undecided in face of genuine uncertainty, so if there is not sufficient information to ground a probability distribution it is better to acknowledge that simpliciter, rather than pretending to possess a certitude that we simply do not possess.

I think this critique of Bayesianism is in accordance with the views of **Keynes**’ *A Treatise on Probability* (1921) and *General Theory* (1937). According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but rational expectations. Sometimes we “simply do not know.” Keynes would not have accepted the view of Bayesian economists, according to whom expectations “tend to be distributed, for the same information set, about the prediction of the theory.” Keynes, rather, thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by “degrees of belief”, beliefs that have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents modeled by Bayesian economists.

The bias toward the superficial and the response to extraneous influences on research are both examples of real harm done in contemporary social science by a roughly Bayesian paradigm of statistical inference as the epitome of empirical argument. For instance the dominant attitude toward the sources of black-white differential in United States unemployment rates (routinely the rates are in a two to one ratio) is “phenomenological.” The employment differences are traced to correlates in education, locale, occupational structure, and family background. The attitude toward further, underlying causes of those correlations is agnostic … Yet on reflection, common sense dictates that racist attitudes and institutional racism

mustplay an important causal role. People do have beliefs that blacks are inferior in intelligence and morality, and they are surely influenced by these beliefs in hiring decisions … Thus, an overemphasis on Bayesian success in statistical inference discourages the elaboration of a type of account of racial disadavantages that almost certainly provides a large part of their explanation

Test

We’re still not getting the point of Bayesianism, “the Pythagoras’ Theorem of Probability”, i.e. the mutual correction of two different ways of measuring something. See this from a simple account summarising a long one: http://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/.

“■Even science is a test. At a philosophical level, scientific experiments can be considered “potentially flawed tests” and need to be treated accordingly. There is a test for a chemical, or a phenomenon, and there is the event of the phenomenon itself. Our tests and measuring equipment have some inherent rate of error.

Bayes’ theorem finds the actual probability of an event from the results of your tests. For example, you can:

■Correct for measurement errors. If you know the real probabilities and the chance of a false positive and false negative, you can correct for measurement errors.

■Relate the actual probability to the measured test probability. Bayes’ theorem lets you relate Pr(A|X), the chance that an event A happened given the indicator X, and Pr(X|A), the chance the indicator X happened given that event A occurred. Given mammogram test results and known error rates, you can predict the actual chance of having cancer”.

So change the example to a dice where the event is the probability of it having six symmetrical sides given the indicator of throwing any particular one is 1/6th, and the probability of the dice throwing 6 is given by the event of its having six symmetrical sides.

There are many practical cases in which a Bayes approach pertorms much better than the frequentist statistical approach. I did some forest survey in the Mississippi delta of Arkansas , Louisiana and Tennessee, in the 1980’s. Frequentist stats were unrealistic and un reliable. What I actually used was a Stein rule estimator, which turns out to have pretty good Bayes characteristics and is somewhat easier to apply to large problems. Multiple tests and harvest products etc. proved the practical advantage of the Bayes probablistic approach.

Your reference to the Stein rule estimator was most helpful and interesting, Charlie, not least because

http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator

provides a context for it (i.e. obtaining an estimate based on a single observation vector y). Presumably ‘dimension’ and ‘vector’ are there to be understood in terms of matrix algebra, my point being that this was not around in Bayes’s time (1701-1761).

What were around then, and the source of interest in probability theory, were games in which there was a risk of dies being loaded, Cartesian coordinate geometry and Newton’s method of successive approximation (http://en.wikipedia.org/wiki/Newton's_method#History) .

I never really dealt with ‘bayesianism’ when reading a paper by bialek (princeton) in physics—i think he considers himself one. But his paper said what I had thought when i’d seen ‘bayes theorem’—-all it is a is a sort of rearangement of the standard definition of conditional probability. Actually it comes down to (when you do the ‘math’) the assumption or definition that P(a,b)=P(b,a).

So I just consider bayesianism to be a derrivative or deformation of the most general case—which is frequentism ) ie the ‘prior’ is like the ‘equal a priori probability’ assumed to compute the probability distributions and entropy of equilibrium statistical mechanics (yielding boltzmann-gibbs).

So, to me frequentism is the ‘linear theory’; but as was discovered recently (maybe on 911 or even before) there are also ‘nonlinear theories’ (or equations). You can think of your system as ‘flat’ and then gradually deform it, or assume there is a (dave) ‘taylor’s series’ which if truncated gives you the linear case first, and then if you consider further terms in the series you get a nitemare. (One can check out J Statistical Mechanics from about 1940 (or 20) through the 70’s etc. to see what happens when you try to include nonlinear terms—the FPU (fermi-pasta-ulam), 2D and 3D ising model (eg Onsager), renormalization group of Kadanoff and Wilson, Prigogine, etc.) are some of the scenery.

(For another analogy consider diophantine equations—essentially polynomials with integers. One of my own discoveries that cases like x=constant or x=ay and even x=ay+b are pretty easy; but if you go to x**2=y**2+z**2 it turns out unless you can remember which page on which you wrote the solution on the margin, the marginal utility of your theorems goes to 0. You can write the general case of diophantine equations (see Hilbert’s 10th problem and stuff by Jones in canada) —the so-called universal diophantine set (julia robionson, martin davis, matyshevic (sic) .So that wiould be like Bayesianism..

All Bayes says is you update your probabilities (thats why they are conditional). I think the issue is ‘ideological’. I for one cannot stand libertarians, but I actually agree on a huge number of issues with them—its really just personality types. (I also hate to a large extent classical and folk music, and only like some of the most politically incorrrect music available (eg NWA, Notorious BIG, sex pistols, etc.) but usually i agree with the politics of all ‘progressive’ musicians.

It is interesting to consider the case when P(a,b)=/P(b,a). If you add another variable one might have that P(a(t’),b(t))=/P(b(t’),a(t))—i.e. you have ‘time’. If you were born, and then died, that may not be the same as having died and then being born (except for possibly singular cases) (Thats also obne way i solve the ‘conjunction fallacy’—kahnemann and tversky—-it turns out its more probable that Linda is a feminist and a fortune (or bank) teller, rather than just being a fortune teller (or seller). One can also use paraconistant logic to show this.

Another interesting thing here which illustrates my point about ideology is I see K Binmore apparently has been responded to by Gelman and Shalizi but with alot of nuances ( http://www.arxiv.org/abs/1006.3868 )

(For related ideological issues you can read Shalizi’s blog 3 toed sloth on things like tsallis entropy, fisher information, and iq or look up ‘missing heritability’. Although very esoteric, other debates exist such as whether transfinite numbers beyond aleph 1 exist (the continuum hypothesis), whether math needs new axioms, whether the US should have invaded iraq, whether cannabis and gay marriage should be legalized (e.g. you could do a joint probability and ask if P(weed, gay)=P(gay, weed) and then use a time variable to get a different phase space so you can dis/confirm Spengler’s the Decline of western civilization or the Club of Rome’s ‘limits to growth projections. I even heard, since I’m literate and informed, on Fox news that there are even debates in economics—eg whether money is neutral like the photon or neutrino, or charged (colored, charmed, strange, etc.) like quarks, whether gross substitution exists (is one man’s trash another womyn’s money?) , whether neoclassical economics is named after and the property of neo of the movie Matrix, and should the red states take the blue pill like CodePink?

The ‘Chicago School’ (‘windy city, 81 shootings one recent weekend–19 dead—-and my parents went there and 1 now goes to the same church Obama went to ( in Honolulu—i did go once since i got a free trip—hawaii is the most dangerous hiking i’ve ever been to—beats the himalayas, alaska, and mexico—when i got back it turned out my apartment like others had been broken into but they aint get nothing) solved this problem—‘free to choose’. This was made clear by the President of UC (Sonnenchein) , though the mathematical appendix to his famous 70’s paper i never really could completely understand).

‘cuz i’m happy’ (pharell williams) except i think its going to rain..

Thanks for the reference, Izzy! I’d been writing my latest comment while you were posting yours, but left off Newton’s use of polynomial series as “too much information”.

“So [you] just consider bayesianism to be a derrivative or deformation of the most general case—which is frequentism”? I suggest to you it is not just about mathematical relationships but their relationship to a real world situation on the null hypothesis that there is no difference between the information they generate in their different ways. The situation is more like Chesterton’s triangulation of distance because otherwise (in the days before laser time-based range finders) one has no criteria to test one’s estimate of a position by. If one’s mental measure is bent all one’s estimates will be bent too. As Charlie’s account illustrates, real scientists test mathematical simulation by comparing it with experimental results.