Overcontrolling in econometrics — a wasteful practice ridden with errors

Home > Uncategorized > Overcontrolling in econometrics — a wasteful practice ridden with errors

Overcontrolling in econometrics — a wasteful practice ridden with errors

February 7, 2020 Lars Syll Leave a comment Go to comments

from Lars Syll

The gender pay gap is a fact that, sad to say, to a non-negligible extent is the result of discrimination. And even though many women are not deliberately discriminated against, but rather self-select into lower-wage jobs, this in no way magically explains away the discrimination gap. As decades of socialization research has shown, women may be ‘structural’ victims of impersonal social mechanisms that in different ways aggrieve them. Wage discrimination is unacceptable. Wage discrimination is a shame.

You see it all the time in studies. “We controlled for…” And then the list starts. The longer the better. Income. Age. Race. Religion. Height. Hair color. Sexual preference. Crossfit attendance. Love of parents. Coke or Pepsi. The more things you can control for, the stronger your study is — or, at least, the stronger your study seems. Controls give the feeling of specificity, of precision. But sometimes, you can control for too much. Sometimes you end up controlling for the thing you’re trying to measure …

An example is research around the gender wage gap, which tries to control for so many things that it ends up controlling for the thing it’s trying to measure. As my colleague Matt Yglesias wrote:

“The commonly cited statistic that American women suffer from a 23 percent wage gap through which they make just 77 cents for every dollar a man earns is much too simplistic. On the other hand, the frequently heard conservative counterargument that we should subject this raw wage gap to a massive list of statistical controls until it nearly vanishes is an enormous oversimplification in the opposite direction. After all, for many purposes gender is itself a standard demographic control to add to studies — and when you control for gender the wage gap disappears entirely!” …

Take hours worked, which is a standard control in some of the more sophisticated wage gap studies. Women tend to work fewer hours than men. If you control for hours worked, then some of the gender wage gap vanishes. As Yglesias wrote, it’s “silly to act like this is just some crazy coincidence. Women work shorter hours because as a society we hold women to a higher standard of housekeeping, and because they tend to be assigned the bulk of childcare responsibilities.”

Controlling for hours worked, in other words, is at least partly controlling for how gender works in our society. It’s controlling for the thing that you’re trying to isolate.

Ezra Klein

Trying to reduce the risk of having established only ‘spurious relations’ when dealing with observational data, statisticians and econometricians standardly add control variables. The hope is that one thereby will be able to make more reliable causal inferences. But — as Keynes showed already back in the 1930s when criticizing statistical-econometric applications of regression analysis — if you do not manage to get hold of all potential confounding factors, the model risks producing estimates of the variable of interest that are even worse than models without any control variables at all. Conclusion: think twice before you simply include ‘control variables’ in your models!

When I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” This argument is not as persuasive as it may seem initially. First of all, if what you are doing is misspecified already, then adding or excluding other variables has no tendency to make things consistently better or worse … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables.
Still more importantly, big, mushy linear regression and probit equations seem to need a great many control variables precisely because they are jamming together all sorts of observations that do not belong together. Countries, wars, racial categories, religious preferences, education levels, and other variables that change people’s coefficients are “controlled” with dummy variables that are completely inadequate to modeling their effects. The result is a long list of independent variables, a jumbled bag of nearly unrelated observations, and often a hopelessly bad specification with meaningless (but statistically significant with several asterisks!) results.

A preferable approach is to separate the observations into meaningful subsets—internally compatible statistical regimes … If this can’t be done, then statistical analysis can’t be done. A researcher claiming that nothing else but the big, messy regression is possible because, after all, some results have to be produced, is like a jury that says, “Well, the evidence was weak, but somebody had to be convicted.”

Christopher H. Achen

Kitchen sink econometric models are often the result of researchers trying to control for confounding. But what they haven’t understood is that the confounder problem requires a causal solution and not statistical “control.” Controlling for everything opens up the risk that we control for “collider” variables and thereby create “back-door paths” which gives us confounding that wasn’t there to begin with.

Comments (6) Leave a comment

Meta Capitalism

February 7, 2020 at 2:51 am

Reply

Mainstream Economics and even some heterodox economists act like a cat in a box :-)
Helen Sakho

February 8, 2020 at 11:32 pm

Reply

It really does not matter in the end, because 8 out of 10 cats like them! This is roughly the same cosy correlation between the two groups of Economists mentioned above.
Scott Baker

February 10, 2020 at 8:44 am

Reply

OK, but if women really do work fewer hours then men,take more years off to raise children, etc. then aren’t those factors in reducing their work effectiveness? And isn’t work effectiveness = merit, and isn’t merit what paying people to work is supposed to be about?
Maybe work schedules should be more flexible, and options to return to work more open, and that would produce better outcomes: i.e. more effective workers. But this is a separate question not answered but mere gender wage gap analysis.
ghholtham

February 10, 2020 at 10:54 am

Reply

This is a standard problem of non-independence of explanatory variables in a regression equation. Any competent researcher would check for that before running a regression. Essentially women work fewer hours and you want to separate the influence of choice from gender-based constraints on them. There are several possibilities. You can hypothesize what are the determinants of hours worked and test that equation using things like marital status, number of children etc as explanatory variables. Using a two-equation model may enable you to solve the problem – if you have the data. You probably need more data, though, e.g. by conducting a survey and asking a representative sample what is going on. Results will help you “parse” the hours coefficient in your regression. If your data set is big enough you can also do as Achen suggests and run your original regression on a subset – that of men and women who work the same hours.

Sometimes we just don’t have enough data or the data do not contain the information we seek. If they do, we still have to be scrupulous in extracting it. It is poor technique to run a regression with a large number of explanatory variables without testing for their independence and exogeneity. Such tests exist and are included in standard statistical packages so there is no excuse. Lars comes in at the wrong level. There are plenty of terrible examples of misuse of econometrics to criticise but he fails to take the villains to task. Instead he tries to elevate misuse of a technique to inevitable failure of the technique itself. It is not accurate to present poor research technique as if it were some fundamental methodological problem.
- Lars Syll
  
  February 10, 2020 at 6:49 pm
  
  Reply
  
  Sorry, Gerry, but more data certainly does NOT solve problems of causality. Data are facts and do not tell us what happens in the counterfactual world that we have to ‘confront’ if we want to be able to answer causal questions. If you want to “come in at the right level” here I suggest you read Judea Pearl’s “The Book of Why” (Allen Lane 2018).
ghholtham

February 10, 2020 at 11:18 pm

Reply

It is strange that we do not seem able to connect although we agree on a lot. Hypotheses about causality do not emerge from data and I agree that collecting more data does not generate hypotheses. Econometrics does not generate hypotheses. Yet data are necessary for testing any general hypothesis about causality. The more the hypothesis is hedged around with conditions – as it is sure to be in a complex social system – the more data you need to test it. Please help me to understand your point. We intuit “causality” in more ways than one. When we are tempted to hazard a generalization about causality of the form that ceteris paribus X causes Y how are we supposed to test it? Because you need not worry whether a causal hypothesis will apply in a counterfactual world if it demonstrably does not apply in the actual one.
It is true that a hypothesis may pass tests and still turn out to be wrong or to apply only under restricted conditions. But that’s life. It happened to Newton so it can certainly happen to anyone else. You can’t get around that by somehow “confronting causality”, whatever that means.

No trackbacks yet.

RWER Board of Editors

Nicola Acocella (Italy, University of Rome) Robert Costanza (USA, Portland State University) Wolfgang Drechsler ( Estonia, Tallinn University of Technology) Kevin Gallagher (USA, Boston University) Jo Marie Griesgraber (USA, New Rules for Global Finance Coalition) Bernard Guerrien (France, Université Paris 1 Panthéon-Sorbonne) Michael Hudson (USA, University of Missouri at Kansas City) Frederic S. Lee (USA, University of Missouri at Kansas City) Anne Mayhew (USA, University of Tennessee) Gustavo Marqués (Argentina, Universidad de Buenos Aires) Julie A. Nelson (USA, University of Massachusetts, Boston) Paul Ormerod (UK, Volterra Consulting) Richard Parker (USA, Harvard University) Ann Pettifor (UK, Policy Research in Macroeconomics) Alicia Puyana (Mexico, Latin American School of Social Sciences) Jacques Sapir (France, École des hautes études en sciences socials) Peter Söderbaum (Sweden, School of Sustainable Development of Society and Technology) Peter Radford (USA, The Radford Free Press) David Ruccio (USA, Notre Dame University) Immanuel Wallerstein (USA, Yale University)

Real-World Economics Review Blog

Overcontrolling in econometrics — a wasteful practice ridden with errors

Share this:

Leave a comment Cancel reply

Email subscription to this blog

Real-World Economics Review

WEA Books

follow this blog on Twitter

Top Posts- last 48 hours

Regular Contributors

Real World Economics Review

—– look inside —– $5.94 / $20.00

—– look inside —– $4.90 / $8.00

—– look inside —– $15.99

—– look inside —– $5.99 / 12.99

—– look inside —– $5.93 / $12.99

—– look inside —– $4.97 / $9.90

WEA online conference: Trade Wars after Coronavirus

Comments on recent RWER issues

————– WEA Paperbacks ————– ———– available at low prices ———– ————- on most Amazons ————-

—— Ugarteche, Puyana and Madi ——

Gerson Lima / Maria Alejandra Madi

Edward Fullbrook and Jamie Morgan

————— Michael Hudson ————–

Maria Alejandra Madi / Jack Reardon

————- Edward Fullbrook ————-

—————— Steve Keen —————–

————— Richard Smith —————

————– Gustavo Marques————

– Victor Beker and Beniamino Moro –

————– Lars Pålsson Syll ————-

—————– Stuart Birks —————-

Edward Fullbrook and Jamie Morgan

WEA Periodicals

----- World Economics Association ----- founded 2011 – today 13,800 members

Recent Comments

Comments on issue 74 - repaired

Comments on RWER issues

WEA Online Conferences

—- More WEA Paperbacks —-

———— Armando Ochangco ———-

Shimshon Bichler / Jonathan Nitzan

————— Mauro Gallegati ————–

————— Herman Daly —————-

————— Asad Zaman —————

—————– C. T. Kurien —————

————— Robert Locke —————-

Guidelines for Comments

Most downloaded RWER papers

Family Links

Contact

follow this blog on Twitter

RWER Board of Editors

WEA e-books