Uncertainty Estimation for Regression

Statistics and Machine Learning Toolbox™ provides features for estimating the uncertainty of the true response for a regression problem. Uncertainty estimation is important in safety critical applications, where the AI system must be able to generalize over unseen data, and the uncertainty associated with the AI model predictions must be handled in a way that does not compromise safety. Uncertainty estimation has applications in domains such as drug development [2], aerospace [3], and model risk management [4].

You can choose between parametric and nonparametric uncertainty estimation methods.

Parametric — These methods rely on specific distributional assumptions about the errors in your regression model, such as the assumption that the errors follow a normal distribution. You can use the functions fitlm, fitrgp, and fitrgam to follow this approach. For an example using a Gaussian process regression (GPR) model, see Compute Predicted Responses. For an example using a generalized additive model (GAM), see Plot Prediction Intervals.
Nonparametric — These methods do not require making distributional assumptions about the data. You can use the functions fitrqlinear, fitrqnet, and quantilePredict to follow this approach. For an example using a quantile linear model, see Create Prediction Interval Using Quantiles. For an example using a bag of regression trees, see Estimate Prediction Intervals Using Percentiles.

Prediction Intervals

In the figure below, the model returns a point prediction (red line) for the data (dark blue points), and does not provide any information on the spread of the response (gray region). Estimating the uncertainty around a prediction gives you information about the spread of the response and allows you to determine regions of high fidelity, where model predictions follow the true response closely. In regions of low fidelity, you might be unable to trust model estimates of the true response.

Plot of the data in dark blue, the model predictions in red, and the response spread in gray

A prediction interval is an estimate of the spread of the response around a model prediction. A prediction interval computes the range within which the true response falls with an acceptable error rate. In the figure below, the prediction interval (light blue region) contains the true response with probability 0.9 (that is, with an acceptable error rate of 0.1).

Plot of the data in dark blue, the model predictions in red, and the prediction interval in light blue

Split Conformal Prediction

Split conformal prediction (SCP) is a distribution-free, model-agnostic statistical framework that returns prediction intervals with certain validity guarantees, given an acceptable error rate.

SCP involves computing conformity scores to calibrate a prediction interval. The type of SCP depends on the conformity score function.

You can compute conformity scores s by using absolute residuals: $s_{i} = | y_{i} - f (X_{i}) |$ , where X_i is the predictor data for observation i, y_i is the response for observation i, and f(X_i) is the model prediction for observation i.
Alternatively, you can compute conformity scores s by using quantile regression: $s_{i} = \max {f_{α / 2} (X_{i}) - y_{i}, y_{i} - f_{1 - α / 2} (X_{i})}$ , where f_α/2(X_i) is the lower quantile of the fitted regression model, evaluated at X_i, and f_1−α/2(X_i) is the upper quantile of the fitted regression model, evaluated at X_i.

Use split conformal prediction when you have enough data to calibrate and generate prediction intervals. For an example, see Create Prediction Intervals Using Split Conformal Prediction.

Validity Guarantees of SCP

Split conformal prediction provides two types of guarantees on the validity of the prediction interval.

SCP guarantees that $P {y_{m} \in C_{α} (X_{m})} \geq 1 - α$ , where:
- α is the user acceptable error rate.
- C_α is the prediction interval computed using the training data, the calibration data, and a regression model.
- X_m is the test predictor data.
- y_m is the test response variable.
Note that the probability is marginal over possible sets of training, calibration, and test data.
SCP provides a probably approximately correct (PAC) guarantee on the validity of the prediction interval, conditioned on the training and calibration data. Specifically, $P {β (D_{n}) \leq α + \sqrt{\frac{\log (\frac{1}{δ})}{2 n_{1}}}} \geq 1 - δ$ , where $β (D_{n}) = P {y_{m} \notin C_{α} (X_{m}) | D_{n}}$ is the marginal miscoverage over the test data, conditioned on the combined training and calibration data D_n. n₁ is the number of calibration observations, and δ is the PAC guarantee confidence level. For a definition of the miscoverage rate, see Assess Performance of Prediction Intervals.
The PAC guarantee states that the probability that a chosen set of training and calibration data leads to a significantly higher miscoverage (β) than a given acceptable rate (α) is vanishingly small under split conformal prediction.

In practice, the PAC guarantee is more useful because it provides a guarantee over the prediction interval conditioned on the training and calibration data. Both validity guarantees hold over any data distribution, any regression model, and finite samples of training and calibration data. For more information, see [1].

Assess Performance of Prediction Intervals

You can assess prediction intervals for statistical efficiency by comparing the miscoverage rate and the average interval length. Assume C_α(X_i) is a prediction interval with the acceptable error rate α, evaluated at X_i (that is, observation i in the test set).

The miscoverage rate is defined as $\frac{\sum_{i = 1}^{m} I {y_{i} \notin C_{α} (X_{i})}}{m}$ .
The average interval length is defined as $\frac{\sum_{i = 1}^{m} C_{α, u} (X_{i}) - C_{α, l} (X_{i})}{m}$ .

I{·} is the indicator function, y_i is the true response for observation i in the test set, and m is the number of test observations. C_α,u(X_i) is the upper bound of the prediction interval evaluated at test observation i, and C_α,l(X_i) is the lower bound of the prediction interval evaluated at test observation i.

When deciding between prediction intervals returned by different techniques, choose a prediction interval with a lower miscoverage rate and a lower average interval length. In the case of heteroscedastic data, an adaptive prediction interval estimates the variation more accurately. The adaptiveness of a prediction interval is its ability to provide small intervals in regions where the model is confident and large intervals in regions where the model is more uncertain.

References

[1] Bian, Michael, and Rina Foygel Barber. “Training-Conditional Coverage for Distribution-Free Predictive Inference.” Electronic Journal of Statistics 17, no. 2 (January 1, 2023). https://doi.org/10.1214/23-EJS2145.

[2] Eklund, Martin, Ulf Norinder, Scott Boyer, and Lars Carlsson. "Application of Conformal Prediction in QSAR." 8th International Conference on Artificial Intelligence Applications and Innovations (AIAI) (September 2012): 166-175.

[3] MLEAP Consortium. "EASA Research — Machine Learning Application Approval (MLEAP) Final Report." EASA.2021.C38, MLEAP (May 2024). https://www.easa.europa.eu/sites/default/files/dfu/mleap-d4-public-report-issue01.pdf.

[4] SR Letter 11-7. "Supervisory Guidance on Model Risk Management." Board of Governors of the Federal Reserve System, Office of the Comptroller of the Currency (April 2011). https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf.