What are the key assumptions of linear regression models?

Home > Uncategorized > What are the key assumptions of linear regression models?

What are the key assumptions of linear regression models?

October 2, 2013 merijnknibbe Leave a comment Go to comments

from: Lars Syll

OLYMPUS DIGITAL CAMERA In Andrew Gelman’s and Jennifer Hill’s statistics book Data Analysis Using Regression and Multilevel/Hierarchical Models the authors list the assumptions of the linear regression model. The assumptions — in decreasing order of importance – are:

1. Validity. Most importantly, the data you are analyzing should map to the research question you are trying to answer. This sounds obvious but is often overlooked or ignored because it can be inconvenient. . . .

2. Additivity and linearity. The most important mathematical assumption of the regression model is that its deterministic component is a linear function of the separate predictors . . .

3. Independence of errors. . . .

4. Equal variance of errors. . . .

5. Normality of errors. . . .

Further assumptions are necessary if a regression coefficient is to be given a causal interpretation . . .

Normality and equal variance are typically minor concerns, unless you’re using the model to make predictions for individual data points.

Andrew Gelman

Yours truly can’t but concur (having touched upon this before here), especially on the “decreasing order of importance” of the assumptions. But then, of course, one really has to wonder why econometrics textbooks — almost invariably — turn this order of importance upside-down and don’t have more thorough discussions on the overriding importance of Gelman/Hill’s two first points …

Comments (8) Leave a comment

lyonwiss

October 3, 2013 at 7:38 am

Reply

Of course these linear regression assumptions are rarely, if ever, satisfied in applied econometrics and they are never mentioned, because model error analysis would inconveniently show statistical insignificance and destroy the worth of many papers, because valid conclusions cannot be drawn the datasets.

So without even mentioning the possibility of model errors, a vast majority of papers simply assert t-values > 3 and R-square greater than about 50 percent indicate statistical significance etc. As those assumptions are almost never met, R-square has to exceed 90 percent (i.e. model and actual values have to be very highly correlated) to indicate statistical significance or sufficiently high signal-to-noise ratio. This level of correlation is almost never found in economic datasets (used in published papers).

Bernanke’s expert status on “The Great Depression” rests mainly on his book (with that title) containing many linear regression analyses using sparse datasets for the depression years of several countries. His expertise has been put to the test in the past several years and he has been claiming victory, according to the data which the US government manipulates.

This is why economics is a pretend science. The econometric models used by governments do not justify the excessive confidence placed in their extreme policies.
- lyonwiss
  
  October 3, 2013 at 8:40 am
  
  Reply
  
  By the way, I should add that the nature and distribution of model errors are often due to the failure of the first two assumptions (point of this post) to be satisfied. For example, nonlinear dependence of a variable can show up as changing asymmetry or lopsidedness of the spread of model errors as the actual or model values changes. Such nonlinear dependence would be evident if error analysis were undertaken, which appears rarely.
- lyonwiss
  
  October 3, 2013 at 9:09 am
  
  Reply
  
  Also, linear regression econometric models do not provide any guide to what happens when variables are changed by large amounts. For examples, if an official interest rate move from 6 percent to 5.8 percent indicates increased production by 0.2 percent (say). It does not follow that at 3 percent official rate, production would increase by 3 percent. At 3 percent official rate, production may even contract and do anything else unpredictable, because linear models are valid only for limited perturbation of the variables.
  
  This was exactly what happened in the GFC, where credit agency and bank internal risk models were all based on “reduced form” (linear regression) models, which predicted linear increases in credit defaults in proportion to increases in the amount mortgage loans. Of course the reality is nonlinear, because there were tipping points, beyond which credit defaults increase nonlinearly with the amount of credit. We suffer seriously from pseudo-science.
guest

October 3, 2013 at 9:32 am

Reply

A question from somebody not at all involved in economics: is there any use of non-parametric methods in econometric models at all, or do they by default rely on regressions, even in the presence of sparse data and inappropriate linearity constraints?
- lyonwiss
  
  October 3, 2013 at 11:02 am
  
  Reply
  
  If you make a lot of assumptions, you can draw a lot of conclusions, which are likely to be wrong, because the more assumptions you make the more likely you are to make false ones. The power of making strong assumptions (e.g. normal distribution) is restrained by increased likelihood of false conclusions. This is the path of most econometrics.
  
  Non-parametric methods make few assumptions (e.g. about distributions) and therefore can draw few conclusions. But the conclusions drawn from non-parametric methods are much more likely to be correct. From a scientific point of view, non-parametric methods should be preferred in presenting economic data, in view data limitations.
  
  Non-parametric methods are less popular with economists because they do not allow economists to make grandiose claims about their ability to forecast the future. An honest economic science should start off with non-parametric methods.
Judea Pearl

October 4, 2013 at 11:45 am

Reply

I must disagree with the list
of assumptions listed in Gelman and Hill’s book.
Linear regression models need not make all the
assumption listed above.

The most obvious needless assumption is
Assumption 3: “Independence of errors”.
This is needed ONLY if a regression coefficient is to be
given a causal interpretation. Otherwise, error independence
is achieved automatically in linear systems, since
the regression coefficient is fitted so as to satisfy
this independence.

(The regression coefficient is the slope of E[Y|X=x])

The less obvious needlessness lies in
Assumption 2: Additivity and Linearity
Linearity is needed only if we insist of equating
the regression coefficient with the slope of E[Y|X=x]
However, if we merely wish to find the best (in MSE)
linear predictor of Y given the observation X=x, then
regression analysis will give us what we ask even if the
E[Y|X=x] is not a linear function of x.

Conclusion, the assumptions of the linear regression
model vary with what we expect to do with the result.
No assumptions at all are needed for optimal predictions.
- lyonwiss
  
  October 7, 2013 at 12:11 am
  
  Reply
  
  Linear regression can be applied to almost any data. But the statistics are meaningless unless certain assumptions are met.
  
  Causality cannot be attributed to linear regression models, which show only correlations. Independence of errors is not guaranteed and important. If errors are highly correlated, we have mathematical instability of estimation due to multi-collinearity.
  
  If the relationship is inherently nonlinear (say a quadratic) then a linear regression is rather meaningless. Try fitting a straight line to y=x^2, what does a “best linear predictor” prove?
  
  Economists have been drawing invalid conclusions, as if no assumptions are needed. A typical example is using t-values greater than three as indication of statistical significance, without checking relevant assumptions.
Dave Marsay

October 6, 2013 at 7:13 pm

Reply

Judea, what do you mean by “No assumptions at all are needed for optimal predictions?”

Suppose that we have lots of economic data for a small island banana republic. Linear regression allows us to extrapolate, optimally (in some mathematical sense). But how do we turn an extrapolation into a prediction? Or by ‘prediction’ do we implicitly assume certain standard assumptions, such as no Tsunami, no crop failure, no riots, no dictator, no invasion and … no financial crisis?

In 2006/7 were the banks making ‘optimal predictions’? Did politicians understand this? It seemed to me at the time that key decision-makers were getting confused by the language, and I am not clear that it has been clarified.

No trackbacks yet.

RWER Board of Editors

Nicola Acocella (Italy, University of Rome) Robert Costanza (USA, Portland State University) Wolfgang Drechsler ( Estonia, Tallinn University of Technology) Kevin Gallagher (USA, Boston University) Jo Marie Griesgraber (USA, New Rules for Global Finance Coalition) Bernard Guerrien (France, Université Paris 1 Panthéon-Sorbonne) Michael Hudson (USA, University of Missouri at Kansas City) Frederic S. Lee (USA, University of Missouri at Kansas City) Anne Mayhew (USA, University of Tennessee) Gustavo Marqués (Argentina, Universidad de Buenos Aires) Julie A. Nelson (USA, University of Massachusetts, Boston) Paul Ormerod (UK, Volterra Consulting) Richard Parker (USA, Harvard University) Ann Pettifor (UK, Policy Research in Macroeconomics) Alicia Puyana (Mexico, Latin American School of Social Sciences) Jacques Sapir (France, École des hautes études en sciences socials) Peter Söderbaum (Sweden, School of Sustainable Development of Society and Technology) Peter Radford (USA, The Radford Free Press) David Ruccio (USA, Notre Dame University) Immanuel Wallerstein (USA, Yale University)

Real-World Economics Review Blog

What are the key assumptions of linear regression models?

Share this:

Leave a comment Cancel reply

Email subscription to this blog

Real-World Economics Review

WEA Books

follow this blog on Twitter

Top Posts- last 48 hours

Regular Contributors

Real World Economics Review

—– look inside —– $5.94 / $20.00

—– look inside —– $4.90 / $8.00

—– look inside —– $15.99

—– look inside —– $5.99 / 12.99

—– look inside —– $5.93 / $12.99

—– look inside —– $4.97 / $9.90

WEA online conference: Trade Wars after Coronavirus

Comments on recent RWER issues

————– WEA Paperbacks ————– ———– available at low prices ———– ————- on most Amazons ————-

—— Ugarteche, Puyana and Madi ——

Gerson Lima / Maria Alejandra Madi

Edward Fullbrook and Jamie Morgan

————— Michael Hudson ————–

Maria Alejandra Madi / Jack Reardon

————- Edward Fullbrook ————-

—————— Steve Keen —————–

————— Richard Smith —————

————– Gustavo Marques————

– Victor Beker and Beniamino Moro –

————– Lars Pålsson Syll ————-

—————– Stuart Birks —————-

Edward Fullbrook and Jamie Morgan

WEA Periodicals

----- World Economics Association ----- founded 2011 – today 13,800 members

Recent Comments

Comments on issue 74 - repaired

Comments on RWER issues

WEA Online Conferences

—- More WEA Paperbacks —-

———— Armando Ochangco ———-

Shimshon Bichler / Jonathan Nitzan

————— Mauro Gallegati ————–

————— Herman Daly —————-

————— Asad Zaman —————

—————– C. T. Kurien —————

————— Robert Locke —————-

Guidelines for Comments

Most downloaded RWER papers

Family Links

Contact

follow this blog on Twitter

RWER Board of Editors

WEA e-books