## Bayesian probability theory banned by English court

from **Lars Syll**

In a recent judgement the English Court of Appeal has denied that probability can be used as an expression of uncertainty for events that have either happened or not.

The case was a civil dispute about the cause of a fire, and concerned an appeal against a decision in the High Court by Judge Edwards-Stuart. Edwards-Stuart had essentially concluded that the fire had been started by a discarded cigarette, even though this seemed an unlikely event in itself, because the other two explanations were even more implausible. The Court of Appeal rejected this approach although still supported the overall judgement and disallowed the appeal …

But it’s the quotations from the judgement that are so interesting:

“Sometimes the ‘balance of probability’ standard is expressed mathematically as ’50 + % probability’, but this can carry with it a danger of pseudo-mathematics, as the argument in this case demonstrated. When judging whether a case for believing that an event was caused in a particular way is stronger that the case for not so believing, the process is not scientific (although it may obviously include evaluation of scientific evidence) and to express the probability of some event having happened in percentage terms is illusory.“The idea that you can assign probabilities to events that have already occurred, but where we are ignorant of the result, forms the basis for the Bayesian view of probability. Put very broadly, the ‘classical’ view of probability is in terms of genuine unpredictability about future events, popularly known as ‘chance’ or ‘aleatory uncertainty’. The Bayesian interpretation allows probability also to be used to express our uncertainty due to our ignorance, known as ‘epistemic uncertainty’ …

The judges went on to say:

“The chances of something happening in the future may be expressed in terms of percentage. Epidemiological evidence may enable doctors to say that on average smokers increase their risk of lung cancer by X%. But you cannot properly say that there is a 25 per cent chance that something has happened … Either it has or it has not“…Anyway, I teach the Bayesian approach to post-graduate students attending my ‘Applied Bayesian Statistics’ course at Cambridge, and so I must now tell them that the entire philosophy behind their course has been declared illegal in the Court of Appeal. I hope they don’t mind.

David Siegelhalter should of course go on with his course, but maybe he also ought to contemplate the rather common fact that people — including scientists — often find it possible to believe things although they can’t always warrant or justify their beliefs. And — probabilistic nomological machines do not exist “out there” and so is extremely difficult to properly apply to idiosyncratic real world events (such as fires).

As I see it, Bayesian probabilistic reasoning in science reduces questions of rationality to questions of internal consistency (coherence) of beliefs, but — even granted this questionable reductionism — it’s not self-evident that rational agents really have to be probabilistically consistent. There is no strong warrant for believing so. Rather, there are strong evidence for us encountering huge problems if we let probabilistic reasoning become the dominant method for doing research in social sciences on problems that involve risk and uncertainty.

In many situations one could argue that there is simply not enough of adequate and relevant information to ground beliefs of a probabilistic kind, and that in those situations it is not really possible, in any relevant way, to represent an individual’s beliefs in a single probability measure.

Say you have come to learn (based on own experience and tons of data) that the probability of you becoming the next president in US is 1%. Having moved to Italy (where you have no own experience and no data) you have no information on the event and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1 — if you are rational. That is, in this case — and based on symmetry — a rational individual would have to assign probability 1% of becoming the next Italian president and 99% of not so.

That feels intuitively wrong though, and I guess most people would agree. Bayesianism cannot distinguish between symmetry-based probabilities from information and symmetry-based probabilities from an absence of information. In these kinds of situations most of us would rather say that it is simply irrational to be a Bayesian and better instead to admit that we “simply do not know” or that we feel ambiguous and undecided. Arbitrary and ungrounded probability claims are more irrational than being undecided in face of genuine uncertainty, so if there is not sufficient information to ground a probability distribution it is better to acknowledge that simpliciter, rather than pretending to possess a certitude that we simply do not possess.

I think this critique of Bayesianism is in accordance with the views of **Keynes**’s *A Treatise on Probability* (1921) and *General Theory* (1937). According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but rational expectations. Sometimes we “simply do not know.” Keynes would not have accepted the view of Bayesian economists, according to whom expectations “tend to be distributed, for the same information set, about the prediction of the theory.” Keynes, rather, thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by “degrees of belief”, beliefs that have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents modeled by probabilistically reasoning Bayesians.

I don’t think the court declared Bayseian approaches to probability “illegal”: they don’t have the power to do that. They (merely) decided it was not appropriate to determining liability in the type of case before them, which is a different thing. So, perhaps their Lordships should have paid better attention in maths and understood better what Baysian analysis does, but Speigelhalter too could spend a little more time learning about the legal system.

Wonderful!

This is almost as silly and ill informed as those Italian judges who damned some of their best geologists for not being able to precisely predict the future – because most people don’t understand probability and don’t go the trouble of trying:

http://en.wikipedia.org/wiki/2009_L%27Aquila_earthquake#Prosecutions

One counter was that the latter was a failure of science communication. But you can only dumb down the science to a point where the intriguing logic of Bayes is not lost. Unfortunately this is still not enough to allow the public and innumerate lawyers to understand the logic of probability currently.

What is also amusing here is that the lawyers themselves love to ascribe mathematical precision to themselves and their judgements by frequently using the phrase “on the balance of probabilities” when in fact they don’t actually understand probability at all as this article illustrates.

Of course economics students being rational fortunately understand Bayes (not).

SLEMBECK, T. & TYRAN, J.-R. 2004. Do institutions promote rationality? An experimental study of the three-door anomaly? Journal of Economic Behavior & Organization, 54 337-350.

Having read the wiki article I’m inclined to agree there was and is a failure of science communication. We are tending to use one word where two are needed. The evidence pointed to the POSSIBILITY of an earthquake, with a BEST ESTIMATE of the LIKELIHOOD of that being realised implying a corresponding POSSIBILITY of it NOT happening.

Certainly there is a failure in science communication. But this is not for want of trying on the part of scientists using whatever language or format you care to choose. Let me illustrate with another more familiar example.

Let me take a comparable example to ‘partially’ predictable earthquake, the reported probabilitis of heavy rainfall and consequent river flow. These are routinely/popularly presented as the ‘1 in 100 year’ or ‘1 in 20 year’ event and these estimates which are solidly grounded in statistics are used to size stormwater drains and identify land acceptable for building on. These numbers have enormous health and safety and economic implications e.g. climate change generating more frequent extreme events changes the statistics even before the any icebergs melt. They are incredibly useful.

But try as hydrologists and meteorologists do they cannot explain to seemingly ‘intelligent’ people that these stats are about what is termed ‘exceedence probability’ (exceedence even just came up in my spellcheck as a non-word). These numbers tell you that such events can be EXCEEDED with the frequency specified, you can get two such events without much interval if you are unlucky and even if nothing happens locally somewhere a bit further away may get hit by a big storm in the near future (depending on covariance).

Unfortunately most people seeing these numbers persist on interpreting these statistics as THEY WANT TO not for what they actually mean. This is usually that a big event wont happen for 100 years or that they are safe as long as they are just above the 100 year mark. The Queensland, Australia floods a few of years were an example of this – the government had fudged the interpretation, people got killed but it was the dam operators who didn’t have perfect vision who got blamed.

The underlying problem in the communication, as with the Italian or these English judges, is you are dealing with an audience who don’t want to, or haven’t the time, or the skills to understand probability. This block as it relates to Bayes is shown by me no better than by the three doors problem/story http://en.wikipedia.org/wiki/Monty_Hall_problem, Indeed until 10 years ago I didn’t get it either. The paper I identified shows how deep it runs in us. It also shows how extensive the self delusion is and our refusal to change our minds in the face of contradictory evidence. I think it illustrates in fact nicely how we are NOT as a species inherently rational when it comes to Bayes calculations. There is a solution we know of – to learn the language of mathematics. But that is hard and so it often doesn’t happen even though the financial implications are so great – I wonder to what extent this failure can also account for the problems in economics when the latter seems to use very simplistic models which lack an appreciation of subtleties like Bayes of use the normal distribution curve like its a mantra rather than one possible model.

I’ve just come across an argument demonstrating the need for Bayes’s theorem. In the absense of any information about bias in a coin, the best estimate of the probability of a ‘head’ one can make is 0.5, and likewise for a ‘tail’. Suppose the first throw turns up ‘head’. What is the best estimate of the probability the next throw will also be a head? Does it stay at 0.5 however much information one has suggesting bias or a tendency to sit on edge? (Without Bayes’ Theorem, it would seem yes).

David Speikelhalter’s judges say “Epidemiological evidence may enable doctors to say that on average smokers increase their risk of lung cancer by X%. But you cannot properly say that there is a 25 per cent chance that something HAS happened …”. Surely they are right? But couldn’t we properly say “something WILL HAVE happened”?

Dave I need to read the Bayes philosophy stuff more but I think you can actually say this after a fashion if there are multiple possible/likely causes.

The language may be key. The full name for Bayes nets is Bayesian Belief Nets some of which I am in the process of constructing for a water treatment project. The data is great but the relationships and net are more slippery and subtle when you try to account for accumulating uncertainty. Think about the difference between a regression line with r2 of 0.99 and another with 0.4. They are both impressive but the former suggests certainty for all practical purposes while the latter illustrates there is a lot we don’t know or would predict.

A lot comes down to what you believe the cause and effect relationship net is. The subtlety is highlighted by attempts to combine what you thought were all objective data. The process forces you to specify what your cause and effect beliefs are.

Its delicious.

If you are of mind to explore this area and haven’t can I recommend Netica which is free for constructing small but powerful Bayes Nets and has lots of examples and backup.

Your comments make me wonder if this is part of the great Bayes v. frequentist debate which I confess to not fully understanding.

See the Swedish comician-philosopher Tage Danielsson’s famous lecture on probability, with English subtitles, at https://www.youtube.com/watch?v=s79_V4NXiUg