## What is randomness?

from **Lars Syll**

Modern probabilistic econometrics relies on the notion of probability. To at all be amenable to econometric analysis, economic observations allegedly have to be conceived as random events.

But is it really necessary to model the economic system as a system where randomness can only be analyzed and understood when based on an *a priori* notion of probability?

In probabilistic econometrics, events and observations are as a rule interpreted as random variables *as if *generated by an underlying probability density function, and* a fortiori *– since probability density functions are only definable in a probability context – consistent with a probability. As Haavelmo (1944:iii) has it:

For no tool developed in the theory of statistics has any meaning – except , perhaps for descriptive purposes – without being referred to some stochastic scheme.

When attempting to convince us of the necessity of founding empirical economic analysis on probability models, Haavelmo – building largely on the earlier Fisherian paradigm – actually forces econometrics to (implicitly) interpret events as random variables generated by an underlying probability density function.

This is at odds with reality. Randomness obviously is a fact of the real world. Probability, on the other hand, attaches to the world via intellectually constructed models, and *a fortiori* is only a fact of a probability generating machine or a well constructed experimental arrangement or “chance set-up”.

Just as there is no such thing as a “free lunch,” there is no such thing as a “free probability.” To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the *outcomes* or *events* (number of points rolled with the die, being e. g. 3 or 5) of the experiment –there strictly seen is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be *shown* to coincide with (or at least converge to) real data generating processes or structures – something seldom or never done!

And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!

From a realistic point of view we really have to admit that the socio-economic states of nature that we talk of in most social sciences – and certainly in econometrics – are not amenable to analyze as probabilities, simply because in the real world open systems that social sciences – including econometrics – analyze, there are no probabilities to be had!

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot really be maintained – as in the Haavelmo paradigm of probabilistic econometrics – that it even should be mandatory to treat observations and data – whether cross-section, time series or panel data – as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes – at least outside of nomological machines like dice and roulette-wheels – are not self-evidently best modeled with probability measures.

If we agree on this, we also have to admit that probabilistic econometrics lacks a sound justification. I would even go further and argue that there really is no justifiable rationale at all for this belief that all economically relevant data can be adequately captured by a probability measure. In most real world contexts one has to *argue* one’s case. And that is obviously something seldom or never done by practitioners of probabilistic econometrics.

Econometrics and probability are intermingled with randomness. But what is randomness?

In probabilistic econometrics it is often defined with the help of independent trials – two events are said to be independent if the occurrence or nonoccurrence of either one has no effect on the probability of the occurrence of the other – as drawing cards from a deck, picking balls from an urn, spinning a roulette wheel or tossing coins – trials which are only definable if somehow set in a probabilistic context.

But if we pick a sequence of prices – say 2, 4, 3, 8, 5, 6, 6 – that we want to use in an econometric regression analysis, how do we know the sequence of prices is random and *a fortiori* being able to treat as generated by an underlying probability density function? How can we argue that the sequence is a sequence of probabilistically independent random prices? And are they really random in the sense that is most often applied in probabilistic econometrics – where X is called a random variable only if there is a sample space S with a probability measure and X is a real-valued function over the elements of S?

Bypassing the scientific challenge of going from describable randomness to calculable probability by just assuming it, is of course not an acceptable procedure. Since a probability density function is a “Gedanken” object that does not exist in a natural sense, it has to come with an export license to our real target system if it is to be considered usable.

Among those who at least honestly try to face the problem – the usual procedure is to refer to some artificial mechanism operating in some “games of chance” of the kind mentioned above and which generates the sequence. But then we still have to show that the real sequence somehow coincides with the ideal sequence that defines independence and randomness within our – to speak with science philosopher Nancy Cartwright (1999) – “nomological machine”, our chance set-up, our probabilistic model.

As the originator of the Kalman filter, Rudolf Kalman (1994:143), notes:

Not being able to test a sequence for ‘independent randomness’ (without being told how it was generated) is the same thing as accepting that reasoning about an “independent random sequence” is not operationally useful.

So why should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts (how many sides do the dice have, are the cards unmarked, etc)

If we do adhere to the Fisher-Haavelmo paradigm of probabilistic econometrics we also have to assume that all noise in our data is probabilistic and that errors are well-behaving, something that is hard to justifiably argue for as a real phenomena, and not just an operationally and pragmatically tractable assumption.

Maybe Kalman’s (1994:147) verdict that

Haavelmo’s error that randomness = (conventional) probability is just another example of scientific prejudice

is, from this perspective seen, not far-fetched.

Accepting Haavelmo’s domain of probability theory and sample space of infinite populations– just as Fisher’s (1922:311) “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’ “collective” or Gibbs’ ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

As David Salsburg (2001:146) notes on probability theory:

[W]e assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify [this] abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.

Just as e. g. Keynes (1921) and Georgescu-Roegen (1971), Salsburg (2001:301f) is very critical of the way social scientists – including economists and econometricians – uncritically and without arguments have come to simply assume that one can apply probability distributions from statistical theory on their own area of research:

Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

Some wise words that ought to be taken seriously by probabilistic econometricians is also given by mathematical statistician Gunnar Blom (2004:389):

If the demands for randomness are not at all fulfilled, you only bring damage to your analysis using statistical methods. The analysis gets an air of science around it, that it does not at all deserve.

Richard von Mises (1957:103) noted that

Probabilities exist only in collectives … This idea, which is a deliberate restriction of the calculus of probabilities to the investigation of relations between distributions, has not been clearly carried through in any of the former theories of probability.

And obviously not in Haavelmo’s paradigm of probabilistic econometrics either. It would have been better if one had heeded von Mises warning (1957:172) that

the field of application of the theory of errors should not be extended too far.

**This importantly also means that if you cannot show that data satisfies all the conditions of the probabilistic nomological machine – including randomness – then the statistical inferences used, lack sound foundations! **

*References*

Gunnar Blom et al: *Sannolikhetsteori och statistikteori med tillämpningar*, Lund: Studentlitteratur.

Cartwright, Nancy (1999), *The Dappled World*. Cambridge: Cambridge University Press.

Fisher, Ronald (1922), On the mathematical foundations of theoretical statistics. *Philosophical Transactions of The Royal Society A*, 222.

Georgescu-Roegen, Nicholas (1971), *The Entropy Law and the Economic Process*. Harvard University Press.

Haavelmo, Trygve (1944), The probability approach in econometrics. *Supplement to Econometrica *12:1-115.

Kalman, Rudolf (1994), Randomness Reexamined. *Modeling, Identification and Control *3:141-151.

Keynes, John Maynard (1973 (1921)), *A Treatise on Probability*. Volume VIII of *The Collected Writings of John Maynard Keynes*, London: Macmillan.

Pålsson Syll, Lars (2007), *John Maynard Keynes*. Stockholm: SNS Förlag.

Salsburg, David (2001), *The Lady Tasting Tea*. Henry Holt.

von Mises, Richard (1957), *Probability, Statistics and Truth*. New York: Dover Publications.

This sounds like Godelian platonism.

In the real ideal world, there is randomness (though, nowadays this depends on how you view quantum mechanics, and the continnuum hypothesis; one can also look at Chaitin-Solvolay on random reality ).

Then in the fictional fantasy of economics and econometrics, there is ‘probability’.

You can just call a spade a spade, updated with Kantian ‘critical realism’ (not the same term used nowadays). Or remember Wittgenstein—whereof one cannot speak, thereof one must remain silent.

Yes. Exactly. As two apparently separate information streams intersect, new possibilities are born. Some of us observe such an event and logically deduce ever accelerating evolution has some sort of random character. Those of us who are glib can write textbooks explaining how to predict likely expectations.

Failing governments spend more and more energy attempting to control and direct what economic propagandists call the free market. Quantum physicists control and direct from educated experience but stop short before the impossibility of a plan for controlled direction of cosmic evolution itself.

The information age has reached a point where even democracy no longer works based on numeric sums to choose representatives. Democracy generates specie focus of distributed intelligence, a quality. Economists have become lost in numbers and will soon leave that aspect of study to accountants or become irrelevant to qualitative growth analysis.

Having looked up Gödelian Platonism, Ishi, it doesn’t look as if Plato, Gottlob Frege, Brouwer and Gödel have quite got the point. The fact that one is discussing an intuitive statement is evidence that someone has constructed a pattern he has seen by embedding it in an object such as an ambiguous figure. The result is a gestalt experience: now you see it; then you see what you hadn’t. Now you (like a computer) see an objective numeral, then the significance of its position in an algorithmic number format. (More likely, as an adult human, you will see a number, then its algorithmic format, then one of a small set of numerals). Looking up Lars’ reference to Cartwright’s nomological machine, that’s it.

Kant was near the mark, whether or not his critique was realist in the neurological sense, but let’s not forget that it, like Bhaskar’s critical realism, is addressing Hume’s mis-takes.

Lars, I found it difficult to pin down in the above a sentence which captures what is at issue here. Let me try “Data generating processes – at least outside of nomological machines like dice and roulette-wheels – are not self-evidently best modeled with probability measures”.

In previous comment on so-called Bayesian merging of so-called independent data streams I’ve tried to make the point that it is not the dice but its apparent symmetries which generate probabilities, and measuring these by the independent (dynamic as against static) method of throwing the dice merely tests whether the dice is near enough symmetrical. Bayes was about independent methods, not independent data.

In a research project I did measuring the reliability of probabilistic reliability theory, the trial did not confirm the hypothesis that the overall failure rate was the sum of the probable failure rates of its component parts. What it did show (with the evidence sticking out like a sore thumb) was that if the hypothesis didn’t work out it was usually because of errors in the system design. I later realised the idea of using redundant equipment to improve reliability (as in using three computers and majority voting in a spaceship) came from Shannon’s Mathematical Theory of Communication.

Not having a dice for his nomological machine, Shannon uses (for one) the probabilities of particular letters following others. He tests the relevance of this by using it to generate data sufficiently like English for him to use in demonstrating his revolutionary techniques for the detection and correction of errors induced by random noise.

It doesn’t seem to have occurred to social statisticians that truth is not achieved by staticising dynamic and diverse situations, but by the detection and correction of errors before they do too much damage. Looking at Meryn’s output, perhaps they do, expecting us to be able to distinguish lies from damn lies and statistics sufficiently confidently to put things right. But wouldn’t it be better if their methods were directed to righting known wrongs rather than predicting the unknowable?

Yes, in fact, so much better that the statistics would have a high probability of being useful.

This is one of (or maybe) the most interesting thread I’ve seen on RWER, though I guess I’d have say I might only give it an 85% ‘truth value’ score.

(In analogy, if one is solving the TSP (traveling salesman) to find the shortest route , effectively one only gets within 15% of the optimal (shortest) route— LIke this truck driver who used his GPS to find the shortest route to a town in WV—he should have stayed on the highway until the interstate, but took the shortest route over the mountain which turned into a dirt road with hairpin turns and he could neither get around them nor back up. Sometimes you must ‘satisfice’ (Herbert Simon) and say ‘good enough is good enough’; Willie Loman and Bill Clinton’s father solved the traveling salesman problem in their own ways.).

The issue is ‘when is a pattern a pattern? If you see something when looking in a rain puddle, have you discovered a new race of people, and where do they go when you put your foot in it ? My view (which I can back up with many references, though interestingly it seems only one economist has ever cited the main one, though they are plenty of precedents in older papers on detecting chaos in economic time series (eg Grandmont), but I may also be only 85% ‘lesswrong.com’ (my ‘karma score’ there is now about -19, and i may be banned by the people of the book (‘superintelligence’)) is you cannot distinguish randomness from determinism. So you can model everything as either a stochastic process or as a deterministic one. Sense=nonsense (reality is its self-inverse, like taoism: ‘men see only what they look at, and they only look at what they already have in mind’ (F Celine), or Nico and the Velvet Undergorund–‘ill be your mirror, reflect what you are, in case you don’t know’.)

Consider ‘random fractals’ or ‘deterministic diffusion’. Or Verlinde’s ‘entropic gravity’ (a sort of Kantian-type resolution of plato/Hume except for Boltzmann/Newton. A deterministic ‘newtonian’ force is the same thing as an entropic ‘Botzmannian’ force. (I decided this was true when I found old papers from the 70’s like MSR formalism, showing you can write diffusion as a Hamilton Jacobi equation (with some caveats). There’s no difference between a model and reality, except they don’t commute except in some infinite limit (like quantum mechanics to classical).

Bruce Edmonds is correct also when he says with enough ‘agents’, if you admit the possibility that there is more than ‘one electron in the world’ (Feynman) and not everyone can ‘represent’ everyone else (though the ‘mean field’ (like economics) approximation is effective), and that not every product is substitutable for every other one, and also can be exactly divided into transcendental numbered parts, ‘anything goes’. (Cellular automota are deterministic, but are also ‘universal computers’ and i think this was shown in the 60’s, if von Neumann didn’t do it.)

Regarding Garret Connoly’s prediction that economists may be an endangered species (unlike the ‘ishi’ crew, who live in wonderland (different f/laws of physics) and can fall in the rain puddle, swim through the water, and still not get wet —–ask alice), interestingly i see there is a discussion among set theorists/mathematical logicians on whether they are going to be put out of business since they still can’t figure out how many angels are on the head of a pin. (I dont really know what economists do apart from a) collect data (like someone i know, now retired, who graduated from U Chicago; his wife was a holocaust surivivor, and her daughter used to make jokes about JY princesses, etc but now i think is somewhat right wing politically).

Regarding Shannon and switching circuits, there is related work by Kron from the 40’s/50’s which shows all the equations of physics can be modeled by electrical circuits. I think maybe Turing’s papes more or less showed everything can be modeled with circuits. (One person I studied with (who I also told not to put my name on a joint paper we wrote since he cut out some of my references to renormalization group theory and math logic, though that was a very pragmatic decision) studied with Aaron Katchalsky and G Oster of UCB in mathematical biology; they showed many biochemical processes (i.e. all of them) could be modeled with electrical networks (i think in Q Rev Biol ’74), but the formalism is bulky or clumsy (though its still around). It seemed easier just to write in a standard dialect, like machine code. (This works when asking directions. Or my own prefered foliation is to code the entire Library of Congress and World Wibe Web into one Godel number, which I then save into my cell phone (my mom bought me one 2 days ago since ‘talking drum’ communication is less fashionable); unfortunately i forgot my password, and if you can’t pass, you go directly to jail, in a monopoly game.

( Katchalsky was killed in a notorious bombing by the PLO. ‘Its a wonderful world’ (louis armstrong). )

p.s. actually its ‘aharon katchalsky’ on ‘network thermodynamics’ and its date is 1971. also, i use the ‘85%’ figure since someone into ‘cognitive dissonance’ told me (like robin dunbar’s stuff on groups) that’s the critical point. (in USA we also have 350.org, another critical point—scaling ideas (kadanoff , wilson) renormalization group. (or ‘size of the critical component’—eros and renyi, and bollobos).

“My view … is you cannot distinguish randomness from determinism. So you can model everything as either a stochastic process or as a deterministic one.”

That’s the point of encryption, of course. You can only model the data in a deterministic way if you already know the key. Again, a computer input routine expecting a number in a certain human format reads could-be-random bits in ascii equivalent until it gets an ascii code which does not represent a number, at which point it has to decide whether that is the end of the number, a typo/transmission error or a programming error (e.g. expecting a quantitative number where the data is actually a serial number which in some variants has letters in it).

What is the probability of being sure that I don’t know?

Statitics in macroeconomics can only repeat past behaviour performance without “knowing” why a particular behaviour performance happened. Mathematicians have to rely on tables of pseduo-random data because they cannot generate perfectly random kinds. So as soon as one tries to introduce statistics into the logical and almost exact science of macroeconomics one is heading for disaster.

It is most significant that I wrote “almost exact” above. My analysis shows a cause and effect nature of the technical parts of the subject, but since there is also a decision-making part which is based on human choice, we should accept that this science is not perfectly exact and can be badly affected by the wrong decisions when they are not logical. Most people have not (yet) accepted that macroeconomics is a logical science, and for this idea to make better sense we need a whole new set of assumptions based on exact definitions. I can supply them but today most people do not seem to consider (let alone respect) the fact that one cannot build this scientific subject on intuition and poor (past) guesses.

Yes. Newton chose the gravitational hypothesis that the apple attracted the earth as well as vice versa, and found that it worked out near enough. Likewise Shannon chose the logical hypothesis that symbolic logic could be both performed and symbolised by switching circuits, and found computers with error correction circuits can work out near enough. You’ve chosen to consider macroeconomics a logical science rather than an empirical one. Given the realist hypothesis that God created the world with a Big Bang, when there was no-one around to tell us what went on and no numbers to measure it with, that works out. However, it leaves macroeconomics defined as a mere framework: a structure like that of an arabic number representation, in which the most significant digit changes when complete. Likewise the economy of man is built on the ecology of life (built ultimately on cosmic energy), and monetary control of economy (or not) on that, and control (or not) of money making on that. The structure is logically true of all, whether or not the empirical micro-economic data hung on it is complete or true.

Key here the idea of “noise” — that component of data that is not part of your focus ‘signal’ and attributed to processes that are unimportant to the focus phenomena. What this post seems to be about is the strategy of dealing with noise as some kind of random distribution. As the post rightly points out, this can be a questionable step – and can not (usually) be justified on firm *foundations*. Rather it is a modelling step, whose wisdom can only be judged on how well this strategy worked – when the whole model (including the randomness) is used – either in terms of its effectiveness or its validation. In this way it should be thought about a LOT more than it usually is, on a par with other, fallible, modelling hypotheses.

For example a lot of the unexplained variation in social (including economic) data is not because people are behaving randomly (people rarely do) but because the data covers a variety of contexts and their behaviour is context-dependent. Then randomness might not be a good model for this variation. An example is in the insurance/risk-avoidance markets where in the 80/90s the variation were often assumed to be random and something like normally distributed.

Edmonds, B. (2009) The Nature of Noise. In Squazzoni, F. (Ed.) Epistemological Aspects of Computer Simulation in the Social Sciences. LNAI 5466:169-182. (http://cfpm.org/cpmrep156.html).

Randomness…is equilibrium….from a higher, more inclusive and integrated point of view. Like perceiving and including both the static and the forceful factor of the economic effects of the rules of cost accounting, and then crafting a macro-economic policy to remedy those effects.