## Residuals

### Purpose

Residuals are useful for detecting outlying *y* values and checking the linear regression assumptions with respect to the error term in the regression model. High-leverage observations have smaller residuals because they often shift the regression line or surface closer to them. You can also use residuals to detect some forms of heteroscedasticity and autocorrelation.

### Definition

The `Residuals`

matrix is an *n*-by-4 table containing four types of residuals, with one row for each observation.

#### Raw Residuals

Observed minus fitted values, that is,

$${r}_{i}={y}_{i}-\widehat{y}{}_{i}.$$

#### Pearson Residuals

Raw residuals divided by the root mean squared error, that is,

$$p{r}_{i}=\frac{{r}_{i}}{\sqrt{MSE}},$$

where *r*_{i} is the raw residual and *MSE* is the mean squared error.

#### Standardized Residuals

Standardized residuals are raw residuals divided by their estimated standard deviation. The standardized residual for observation *i* is

$$s{t}_{i}=\frac{{r}_{i}}{\sqrt{MSE\left(1-{h}_{ii}\right)}},$$

where *MSE* is the mean squared error and *h*_{ii} is the leverage value for observation *i*.

#### Studentized Residuals

Studentized residuals are the raw residuals divided by an independent estimate of the residual standard deviation. The residual for observation *i* is divided by an estimate of the error standard deviation based on all observations except for observation *i*.

$$s{r}_{i}=\frac{{r}_{i}}{\sqrt{MS{E}_{\left(i\right)}\left(1-{h}_{ii}\right)}},$$

where *MSE*_{(i)} is the mean squared error of the regression fit calculated by removing observation *i*, and *h*_{ii} is the leverage value for observation *i*. The studentized residual *sr*_{i} has a *t*-distribution with *n* – *p* – 1 degrees of freedom.

### How To

After obtaining a fitted model, say, `mdl`

, using `fitlm`

or `stepwiselm`

, you can:

Find the

`Residuals`

table under`mdl`

object.Obtain any of these columns as a vector by indexing into the property using dot notation, for example,

mdl.Residuals.Raw

Plot any of the residuals for the values fitted by your model using

For details, see theplotResiduals(mdl)

`plotResiduals`

method of the`LinearModel`

class.

### Assess Model Assumptions Using Residuals

This example shows how to assess the model assumptions by examining the residuals of a fitted linear regression model.

Load the sample data and store the independent and response variables in a table.

load imports-85 tbl = table(X(:,7),X(:,8),X(:,9),X(:,15),'VariableNames',... {'curb_weight','engine_size','bore','price'});

Fit a linear regression model.

mdl = fitlm(tbl)

mdl = Linear regression model: price ~ 1 + curb_weight + engine_size + bore Estimated Coefficients: Estimate SE tStat pValue __________ _________ _______ __________ (Intercept) 64.095 3.703 17.309 2.0481e-41 curb_weight -0.0086681 0.0011025 -7.8623 2.42e-13 engine_size -0.015806 0.013255 -1.1925 0.23452 bore -2.6998 1.3489 -2.0015 0.046711 Number of observations: 201, Error degrees of freedom: 197 Root Mean Squared Error: 3.95 R-squared: 0.674, Adjusted R-Squared: 0.669 F-statistic vs. constant model: 136, p-value = 1.14e-47

Plot the histogram of raw residuals.

plotResiduals(mdl)

The histogram shows that the residuals are slightly right skewed.

Plot the box plot of all four types of residuals.

Res = table2array(mdl.Residuals); boxplot(Res)

You can see the right-skewed structure of the residuals in the box plot as well.

Plot the normal probability plot of the raw residuals.

`plotResiduals(mdl,'probability')`

This normal probability plot also shows the deviation from normality and the skewness on the right tail of the distribution of residuals.

Plot the residuals versus lagged residuals.

`plotResiduals(mdl,'lagged')`

This graph shows a trend, which indicates a possible correlation among the residuals. You can further check this using `dwtest(mdl)`

. Serial correlation among residuals usually means that the model can be improved.

Plot the symmetry plot of residuals.

`plotResiduals(mdl,'symmetry')`

This plot also suggests that the residuals are not distributed equally around their median, as would be expected for normal distribution.

Plot the residuals versus the fitted values.

`plotResiduals(mdl,'fitted')`

The increase in the variance as the fitted values increase suggests possible heteroscedasticity.

## References

[1] Atkinson, A. T. *Plots, Transformations, and Regression. An Introduction to Graphical Methods of Diagnostic Regression Analysis.* New York: Oxford Statistical Science Series, Oxford University Press, 1987.

[2] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. *Applied Linear Statistical Models*. IRWIN, The McGraw-Hill Companies, Inc., 1996.

[3] Belsley, D. A., E. Kuh, and R. E. Welsch. *Regression Diagnostics, Identifying Influential Data and Sources of Collinearity*. Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons, Inc., 1980.

## See Also

`LinearModel`

| `fitlm`

| `stepwiselm`

| `plotDiagnostics`

| `plotResiduals`

| `dwtest`