Home > Uncategorized > Randomized experiments — a dangerous idolatry

Randomized experiments — a dangerous idolatry

from Lars Syll

Hierarchy-of-EvidenceNowadays many mainstream economists maintain that ‘imaginative empirical methods’ — especially randomized experiments (RCTs) — can help us to answer questions concerning the external validity of economic models. In their view, they are, more or less, tests of ‘an underlying economic model’ and enable economists to make the right selection from the ever-expanding ‘collection of potentially applicable models.’

It is widely believed among economists that the scientific value of randomization — contrary to other methods — is totally uncontroversial and that randomized experiments are free from bias. When looked at carefully, however, there are in fact few real reasons to share this optimism on the alleged ’experimental turn’ in economics. Strictly seen, randomization does not guarantee anything.

Assume that you are involved in an experiment where we examine how the work performance of Chinese workers (A) is affected by a specific ‘treatment’ (B). How can we extrapolate/generalize to new samples outside the original population (e.g. to the US)? How do we know that any replication attempt ‘succeeds’? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing an extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P(A|B).  

External validity and extrapolation are founded on the assumption that we could make inferences based on P(A|B) that is exportable to other populations for which P(A|B) applies. Sure, if one can convincingly show that P and P’ are similar enough, the problems are perhaps surmountable. But arbitrarily just introducing functional specification restrictions of the type invariance or homogeneity, is, at least for an epistemological realist far from satisfactory.

Many ‘experimentalists claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific experiments to specific real-world structures and situations that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

In randomized trials the researchers try to find out the causal effects that different variables of interest may have by changing circumstances randomly — a procedure somewhat (‘on average’) equivalent to the usual ceteris paribus assumption.

Besides the fact that ‘on average’ is not always ‘good enough,’ it amounts to nothing but hand waving to simpliciter assume, without argumentation, that it is tenable to treat social agents and relations as homogeneous and interchangeable entities.

Randomization is used to basically allow the econometrician to treat the population as consisting of interchangeable and homogeneous groups (‘treatment’ and ‘control’). The regression models one arrives at by using randomized trials tell us the average effect that variations in variable X has on the outcome variable Y, without having to explicitly control for effects of other explanatory variables R, S, T, etc., etc. Everything is assumed to be essentially equal except the values taken by variable X.

In a usual regression context one would apply an ordinary least squares estimator (OLS) in trying to get an unbiased and consistent estimate:

Y = α + βX + ε,

where α is a constant intercept, β a constant ‘structural’ causal effect and ε an error term.

The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated'( X=1) may have causal effects equal to – 100 and those ‘not treated’ (X=0) may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the OLS average effect particularly enlightening.

Limiting model assumptions in economic science always have to be closely examined since if we are going to be able to show that the mechanisms or causes that we isolate and handle in our models are stable in the sense that they do not change when we ‘export’ them to our ‘target systems,’ we have to be able to show that they do not only hold under ceteris paribus conditions and a fortiori only are of limited value to our understanding, explanations or predictions of real economic systems.

Most ‘randomistas’ underestimate the heterogeneity problem. It does not just turn up as an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that economists produce every year.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

the-right-toolRCTs usually do not provide evidence that the results are exportable to other target systems. The almost religious belief with which its propagators portray it, cannot hide the fact that RCTs cannot be taken for granted to give generalizable results. That something works somewhere is no warranty for us to believe it to work for us here or even that it works generally.

The present RCT idolatry is dangerous. Believing there is only one really good evidence-based method on the market — and that randomization is the only way to achieve scientific validity — blinds people to searching for and using other methods that in many contexts are better. RCTs are simply not the best method for all questions and in all circumstances. Insisting on using only one tool often means using the wrong tool.

  1. November 23, 2017 at 6:37 pm

    Thanks for this clear exposition.

    Throughout decades I’ve observed the social sciences, particularly economics, infected with a belief system called “scientism”, which destroys the presumption that science is objective and free of “faith”, and thus helps contaminate the very thing proponents worship. Scientism is a quasi-religion, in economics complete with prophets (e.g. Smith or Marx), gods (e.g., the “free market” or “the proletariat”), altars, and priests (e.g., Milton Friedman). I belong to a large group of men who meet regularly to discuss things. All earned higher degrees and 40% have taught high school or university. All are proud of their ability to think critically, yet most seem to settle for scientism, believing that such as Scientific American tells them what they need to know.

    I am skeptical about the prospects for randomized controlled experiments in studying real-world economies. Of course, these economies, unlike neoclassical orthodoxy, are of human beings, by human beings and for human beings. Furthermore, I am skeptical about the application of deterministic (cause-and-effect) science. Real economies are complex systems and thus it seems better to apply the principles of ecological systems in our analyses. Among other things, ecological study rarely concerns cause-and-effect, and usually concerns interdependent relationships and quantum ideology.

    Interesting and useful post.

    • Frank Salter
      November 24, 2017 at 9:38 am

      I am total agreement with Lars Syll’s exposition and a part of Econoclast’s comment.

      While ‘scientism’ is an appropriate description of significant problems in economic thinking, the term has various nuances which may confound the specific problems of economic theorising. The problem is very simple and is clearly demonstrated by neoclassical formulations. They fail the test of the scientific method. They fail to describe the empirical evidence. Thus they are clearly wrong. At this point all economists should look for different hypotheses. Only when these are critically examined and only those passing the test of empirical veracity treated as appropriate theory, will economics move forward in its thinking.

      New theories may present difficult challenges for readers as they consider mathematics unfamiliar to them — but the theories must be mathematical. Mathematics will always be able to describe reality. All human actions are mediated through physical reality. It is this reality which the mathematics will describe.

  2. November 24, 2017 at 12:50 pm

    “Randomization is used to basically allow the econometrician to treat the population as consisting of interchangeable and homogeneous groups (‘treatment’ and ‘control’).”

    “Besides the fact that ‘on average’ is not always ‘good enough,’ it amounts to nothing but hand waving to simpliciter assume, without argumentation, that it is tenable to treat social agents and relations as homogeneous and interchangeable entities.”

    It is not that I disagree with what has been said so far, but it is all words, and to experiment even imaginatively I need a visualisable example to experiment on. The formula P(A/B) sounds to me like Bayes’ Theorem, and the verbal formula “treatment and control” sounds to me as an information scientist as akin to a message and the channel which both directs it and separates it from other messages.

    With a nod to Frank on nuances here, the question I would like to address is whether the randomisation is in the ‘treatment’ or the ‘control’, the signal or the incidence of noise, the topic under investigation or the context. Frank is right: the control is the physical reality. Thus, given a six-sided dice, throwing it may help establish whether or not this particular dice is symmetrical, not whether another one is. Given an information channel, the incidence of distortion or typos indicates how well the channel is insulted and magnetically shielded, not what the message is. Thus when Humean economists add new information to old, claiming Bayes Theorem is enabling them to improve the accuracy of their estimates, what they are actually doing is revealing the persistence of old errors in the new information.

    • November 24, 2017 at 12:53 pm

      Typo! For ‘insulted’ read ‘insulated’!

  3. November 26, 2017 at 12:19 pm

    RCT is at best one tool for scientific research. It needs to be combined with others to provide any possibility of a conclusion that could be called scientific. The difficulties of RCT in scientific research is well illustrated in one of the most famous scientific studies in the history of biology and medicine. Pasteur’s search for the nature of Anthrax and a cure for its effects on animals and people. After lots of searching it turned out that the Anthrax Pasteur studied in the laboratory was different from the Anthrax in sheep and other farm animals. And the Anthrax in the soil on farms was still another type of Anthrax. In other words, Anthrax changed (mutated) depending on its environment. What it was thus depended on which environment the scientist studied. Thus, what killed Anthrax in the laboratory might not kill it on the farm or in farm animals. Pasteur was astute enough to use both controlled work in the laboratory and several types of field studies. That’s what authentic scientists do. They’re not locked into some supposedly “proper” methodology.

  1. No trackbacks yet.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.