## On probabilism and statistics

from **Lars Syll**

‘Mr Brown has exactly two children. At least one of them is a boy. What is the probability that the other is a girl?’ What could be simpler than that? After all, the other child either is or is not a girl. I regularly use this example on the statistics courses I give to life scientists working in the pharmaceutical industry. They all agree that the probability is one-half.

So they are all wrong. I haven’t said that the

olderchild is a boy. The child I mentioned, the boy, could be the older or the younger child. This means that Mr Brown can have one of three possible combinations of two children: both boys, elder boy and younger girl, elder girl and younger boy, the fourth combination of two girls being excluded by what I have stated. But of the three combinations, in two cases the other child is a girl so that the requisite probability is 2/3 …This example is typical of many simple paradoxes in probability: the answer is easy to explain but nobody believes the explanation. However, the solution I have given

iscorrect.Or is it? That was spoken like a probabilist. A probabilist is a sort of mathematician. He or she deals with artificial examples and logical connections but feel no obligation to say anything about the real world. My demonstration, however, relied on the assumption that the three combinations boy–boy, boy–girl and girl–boy are equally likely and this may not be true. The difference between a statistician and a probabilist is that the latter will define the problem so that this is true, whereas the former will consider

whetherit is true and obtain data to test its truth.

Statistical reasoning certainly seems paradoxical to most people.

Take for example the well-known Simpson’s paradox.

From a theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities, unless you are — miraculously — able to keep constant *all* other factors that influence the probability of the outcome studied.

To understand causality we always have to relate it to a specific causal *structure*. Statistical correlations are *never* enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

OK, I’ll bite. The original boy-girl question did not specify that the ages mattered, only the gender. Therefore the elder-boy/younger-girl vs. elder-girl/younger-boy are the same as far as the question is concerned, so the probability is in fact 0.5.

Is the question out of context ?

Words usually bring in ambiguity. Try to turn it into math equation ( I hope I do it correctly):

P(G=1|B=1) = P(G=1,B=1)/P(B=1) = (1-1/4-1/4)/(1-1/4) = (2/4) / (3/4) = 2/3.

The first term is the probability of one girl conditioned on one boy. The second term is the probability of one girl and one boy over the probability of one boy (two children). Thus, we have 1/2 chance on having one girl and one boy. The key assumption is that having a boy or girl is independent, which may or may not be true for a particular couple/family.

We already know there are two children and one of them is boy, so the probability of two girl is 0, which means the probability of one boy (P(B=1)) is 1-1/4.

No, it’s an old chestnut. Making the usual assumptions, which are realistic, 2/3 girl is correct, meaning it is a realistic answer that will be usually be close to what is seen in practice. The book being quoted by Stephen Senn gives a real world experiment that gave 70%, even further from 50%.

Ages have nothing to do with it except that it is a convenient way to differentiate the two children. Hair color or anything else would work as well. As you note, there is no differentiation in the question, only in that answer. But your third sentence “Therefore …” is basically a non sequitur.

Imagine yourself as a dictator that puts all the two children families in People’s Probability & Statistics Square. Then says to the two girl families – “go home”. Then ask the question of the remaining parents – is your family boy and girl or boy and boy? Mirabile dictu, the statistics will be that 2/3 will be boy-girl. So 2/3 of these parents would answer the question “Is your other child a girl?” – Yes. Nothing about ages or hair color. One thinks that whether one child is a boy cannot change sex ratios, and this is correct, but the reasoning from this principle must be careful. If one said that the first child is a boy, then 1/2 for the other would in fact be correct.

Anyways, that’s how the original question is meant to be interpreted and it is probably the most natural interpretation of the question, though statistics may differ about that. :-)

The example is very similar to one I discuss at https://djmarsay.wordpress.com/notes/puzzles/the-two-daughter-problem/. We both make an important distinction between what he calls Probabilists and statisticians. But, whereas the above quote claims that Probabilists keep quiet about reality, I can only wish it were so. Probabilists are often not only steeped in probability theory, but also human.

For example, at popular maths lectures dealing with probability it is very common for the lecturer to get out a coin and ask for the probabilities of various combinations of ‘Heads’ and ‘Tails’. It seems to me that, far from keeping quiet on the grounds that their mathematics is silent on such questions, those who have (dangerously) studied a little probability theory will often be the first to shout out, particularly for the more obscure combinations. So far, they have always been very wrong (the coin is generally double-headed).

The problem is that the word ‘probability’ is used in different mathematical theories, and (as in other areas) the unwary can easily conflate the meanings, at best confusing everyone. In the lectures there is a case to be made that the lecturer seemed to be asking for the Probabilists’ probability and that it was reasonable for the Probabilists to be taken by surprise by how things turned out, as such things were outside their domain of expertise. So the lesson for practical people is that when offered a probability estimate they should enquire whether it is just a Probabilists’ estimate or more soundly based. A discussion of the Principle of Indifference might be a good place to start.

As always, you enlighten me. Have you read and if you have could you comment on it?

The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives (Economics, Cognition, and Society)

by Stephen T. Ziliak et al.

Link: http://a.co/ai807Jv

The ‘probability’ of the occurrence of an event is a function of what we know and don’t know about the situation, of what we presume is true about the situation, and of the specifications used in determining that probability. Changes in these produce different figures. That is, a probability does not properly ‘exist’. It’s not a phenomenon nor a feature of phenomena, or even a ‘fact’ about the world. It’s a contrivance, an heuristic device.

The probability of a newborn being a boy could be 49.999%. The probability for Asians may be 49.9985% and of African-Americans 50.0005%. These might ‘average out’ to a single figure, but only under some presumption re the distribution of race. Too, just plain 50% could be acceptable across all of these differences, but only if we agree on the range of error criterion.

Of course, the variances in this example are niggling. But when we get to more complex large-scale, messy real-world situations, such as in an actual economy, they loom large.

A probability figure for an economic outcome, calculated under the presumption of one economic model being true can be radically different from one calculated under another. And the less surety about which model is correct, the weaker the figure concluded. Too, a presumed model, even if correct, must predict with a sufficiently narrow range of error to yield a result of practical value. Unfortunately, these kinds of surety and accuracy simply do not exist in the economic world. In addition, how we frame our question, and what features and elements of the economy we decide to include and exclude — geographic boundaries, time frame, players, ‘external’ interactive processes, etc. — also affect what we end up with.

What we have, then, in economics is a hodge-podge of conditional circumstances, and of such deep uncertainty about what is known, that it renders almost any ‘probability’ determination of little meaning or utility.

Clarity is what I came away with reading your comment. Thank you.