## Econometrics — science based on whimsical assumptions

from** Lars Syll**

It is often said that the error term in a regression equation represents the effect of the variables that were omitted from the equation. This is unsatisfactory …

There is no easy way out of the difficulty. The conventional interpretation for error terms needs to be reconsidered. At a minimum, something like this would need to be said:

The error term represents the combined effect of the omitted variables, assuming that

(i) the combined effect of the omitted variables is independent of each variable included in the equation,

(ii) the combined effect of the omitted variables is independent across subjects,

(iii) the combined effect of the omitted variables has expectation 0.This is distinctly harder to swallow.

Yes, indeed, that *is* harder to swallow.

Those conditions on the error term actually mean that we are being able to construct a model where all relevant variables are included and correctly specify the functional relationships that exist between them.

But that is actually impossible to fully manage in reality!

The theories we work with when building our econometric regression models are insufficient. No matter what we study, there are always some variables missing, and we don’t know the correct way to functionally specify the relationships between the variables (usually just *assuming* linearity).

*Every* regression model constructed is misspecified. There is always an endless list of possible variables to include, and endless possible ways to specify the relationships between them. So every applied econometrician comes up with his own specification and ‘parameter’ estimates. No wonder that the econometric Holy Grail of consistent and stable parameter-values is still nothing but a dream.

In order to draw inferences from data as described by econometric texts, it is necessary to make whimsical assumptions. The professional audience consequently and properly withholds belief until an inference is shown to be adequately insensitive to the choice of assumptions. The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable. If we are to make effective use of our scarce data resource, it is therefore important that we study fragility in a much more systematic way. If it turns out that almost all inferences from economic data are fragile, I suppose we shall have to revert to our old methods …

A rigorous application of econometric methods in economics really presupposes that the phenomena of our real-world economies are ruled by stable causal relations between variables. Parameter-values estimated in specific spatio-temporal contexts are *presupposed* to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.

Real-world social systems are not governed by stable causal mechanisms or capacities. As Keynes noticed when he first launched his attack against econometrics and inferential statistics already in the 1920s:

The atomic hypothesis which has worked so splendidly in Physics breaks down in Psychics. We are faced at every turn with the problems of Organic Unity, of Discreteness, of Discontinuity – the whole is not equal to the sum of the parts, comparisons of quantity fails us, small changes produce large effects, the assumptions of a uniform and homogeneous continuum are not satisfied. Thus the results of Mathematical Psychics turn out to be derivative, not fundamental, indexes, not measurements, first approximations at the best; and fallible indexes, dubious approximations at that, with much doubt added as to what, if anything, they are indexes or approximations of.

The kinds of laws and relations that econom(etr)ics has established, are laws and relations about entities in models that presuppose causal mechanisms being atomistic and additive. When causal mechanisms operate in real-world social target systems they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. If economic regularities obtain they do it (as a rule) only because we engineered them for that purpose. Outside man-made “nomological machines” they are rare, or even non-existent. Unfortunately, that also makes most of the achievements of econometrics – as most of the contemporary endeavours of economic theoretical modelling – rather useless.

Regression models are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals.

However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions …Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious …In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And

an enormous amount of fiction has been produced, masquerading as rigorous science.

The theoretical conditions that have to be fulfilled for regression analysis and econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science and economics. Although regression analysis and econometrics have become the most used quantitative methods in social sciences and economics today, it’s still a fact that most of the inferences made from them are invalid.

“Although regression analysis and econometrics have become the most used quantitative methods in social sciences and economics today, it’s still a fact that most of the inferences made from them are invalid.”

There is no valid science in economics. Study history.

Although I am not an economist I am greatly interested in economics because of the way it effects all citizens. What amazes me is that ignoring history fails to explain how we have arrived at the present. and if we ignore history we are prone to repeat the same mistakes again and again. Now doubt some might say I am repeating the obvious, why then are economists continually ignoring what is obvious to many citizens. Ted

I totally agree with you, Ted. And when economists completely ignore the obvious you are logically justified in continuously repeating it. I have the same problem trying to share the sometimes not so obvious: like historic truths and motivations concealed by “scorched earth” policies, not just “lies, damned lies and statistics”.

Let’s all agree with everything Lars says. What then is the right approach to seeing whether a particular causal hypothesis fits a particular set of data for a period of time or a range of localities? You could start by specifying the hypothesis as generally as possible. In particular you would not assume linearity unless there was a compelling reason to do so. I do not know what “atomistic” means in this context. Your hypothesis could be neo-classical, Marxist or astrological. So long as it implies a relationship between observable variables we can try to test it. You would include any “confounding” variables that might obscure the relationship you are positing. You would also be aware of the history and background of the data. If it included a political revolution, a volcanic eruption or just a change in data definitions you would attempt to take account of these things with proxy or dummy variables. If the variables are non-stationary you would use cointegration tests to avoid spurious correlation owing to shared trends. You would then test fit the resulting equation to the data. Because no “economic” theory will explain all the movements of variables in a social situation, you will not expect a perfect fit; you will expect residuals. Are the residuals of this process independent of the explanatory variables, are they serially independent and are they normally distributed with constant variance? All these things can and should be tested. You would also analyse the residuals for breaks in the series indicating a structural shift that your model does not account for. If the residuals are not well behaved the model could well be mis-specified and you cannot calculate confidence intervals for any parameter. The equation may or may not forecast well but you have not produced a model that can be said to provide an explanation of the data.

Suppose your residuals pass the tests? The equation may still be a mess with complicated functional forms and a superfluity of variables, probably leaving few degrees of freedom. So you must try to simplify. Can we eliminate some variables, can we eliminate higher powers of variables or interaction terms? Each simplification is tested by its ability to leave the residuals well behaved. (Normal distribution of variables makes it credible that they are generated by a host of unknown factors orthogonal to the causes you are hypothesizing – by the central limit theorem). There is no assurance at all that this process leads to a tractable model that can be said to fit. Occasionally though we are lucky. The model is simplified, reflects a causal hypothesis, has well behaved residuals that enable us to express degrees of confidence in parameter estimates. If the theory specifies a particular range for a parameter (e.g. that it is positive), that can be tested and the theory rejected or accepted – FOR THE DATA SET IN QUESTION.

Have we proved a general theory? Of course not. But if we believe our hypothesis is a general one, finding that it is compatible with a given data set is a necessary though not sufficient condition for its being generally true or useful. Would we expect the model to forecast well in another time or place? Not necessarily, As Lars says that is a matter for hope not expectation. Since by definition we do not know what generated the errors in our original sample we cannot say what unknown influences will be at work in another sample so we cannot know that their influence will be of the same order of magnitude. We are not engaged in magic; we cannot extract causal hypotheses from data, only test them on data. And we may have some confidence that the hypothesis will apply to other samples but we cannot ever know that.

Most serious econometric researchers proceed in that way. Their efforts are often inconclusive and they do not overclaim for their results. They are not “presupposing” anything; they are hypothesising that a comprehensible relationship is present among a welter of adventitious influences and then looking to see whether it is supported in a particular case. Of course less serious people use regression analysis carelessly or tendentiously in support of prejudices. But if medical research had stopped because of the existence of quacks we would still be treating fevers by the application of leeches. I realise, however, these points will be unpersuasive to people who prefer to think of economics as a branch of theology or literary criticism, immune to systematic empirical exploration. And Keynes’ opinions on this topic, formed before the existence of computers made modern econometrics possible, are of no contemporary relevance. Darwin was an even greater thinker but we do not consult his opinions on modern genetics or the structure of DNA.