resubPredict

Class: RegressionGP

Resubstitution prediction from a trained Gaussian process regression model

Syntax

ypred = resubPredict(gprMdl)
[ypred,ysd] = resubPredict(gprMdl)
[ypred,ysd,yint] = predict(gprMdl)
[ypred,ysd,yint] = predict(gprMdl,Name,Value)

Description

ypred = resubPredict(gprMdl) returns the predicted responses, ypred, for the trained Gaussian process regression (GPR) model, gprMdl.

[ypred,ysd] = resubPredict(gprMdl) also returns the estimated standard deviations of the predicted responses corresponding to the rows of gprMdl.X.

[ypred,ysd,yint] = predict(gprMdl) also returns the 95% prediction intervals, yint, for the true responses corresponding to each row of training data, gprMdl.X.

[ypred,ysd,yint] = predict(gprMdl,Name,Value) returns the prediction intervals with additional options, specified by one or more Name,Value pair arguments. For example, you can specify the confidence level of the prediction interval.

Input Arguments

expand all

Gaussian process regression model, specified as a RegressionGP object.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Significance level for the prediction intervals, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range from 0 to 1.

Example: 'Alpha',0.01 specifies 99% prediction intervals.

Data Types: single | double

Output Arguments

expand all

Predicted response values, returned as an n-by-1 vector, where n is the number of observations in the training data.

Standard deviation of the predicted response values corresponding to the rows of gprMdl.X, returned as an n-by-1 vector. ysd(i), i = 1, 2, ..., n, contains the estimated standard deviation of the new response corresponding to the predictor values at the ith observation in the training data.

Prediction intervals for the true response values corresponding to the rows of gprMdl.X, returned as an n-by-2 matrix, where n is the number of observations in the training data. The first column of yint contains the lower limits and the second column contains the upper limits of the prediction intervals.

Examples

expand all

This example uses "Housing" data set [1] from the UCI machine learning archive [2] described in http://archive.ics.uci.edu/ml/datasets/Housing. Download the data and save it in your current directory as a data file named housing.data.

The dataset has 506 observations. The first 13 columns contain the predictor values and the last column contains the response values. The goal is to predict the median value of owner-occupied homes in the Boston suburb area as a function of 13 predictors.

Load the data and define the response vector and predictor matrix.

load('housing.data');
X = housing(:,1:13);
y = housing(:,end);

Train a GPR model using subset of regressors ('sr') approximation method with Matern 3/2 ('Matern32') kernel function. Predict using the fully independent conditional ('fic') method.

gprMdl = fitrgp(X,y,'KernelFunction','Matern32',...
'FitMethod','sr','PredictMethod','fic');

Predict the responses using the trained GPR model. Compute the 99% prediction intervals.

[ypred,~,yint] = resubPredict(gprMdl,'Alpha',0.01);

Plot the actual response values along with predictions from the GPR model.

figure;
h1 = area([yint(:,1) yint(:,2)-yint(:,1)],-8,...
'FaceColor',[0.85,0.85,0.85],'EdgeColor',[0.85,0.85,0.85]);
hold on;
h1(1).FaceColor = 'none'; % remove color from bottom area
h1(1).EdgeColor = 'none';
h2 = plot(y,'r'); % Plot original response values
h3 = plot(ypred,'b--'); % Plot predicted response values
legend([h2 h3 h1(2)],'Actual response','Predicted response',...
'Prediction intervals','Location','South');
axis([0 510 -7 65]);
hold off

The gray area shows the 99% prediction intervals.

Tips

  • You can choose the prediction method while training the GPR model using the PredictMethod name-value pair argument in fitrgp. The default prediction method is 'exact' for n ≤ 10000, where n is the number of observations in the training data, and 'bcd' (block coordinate descent), otherwise.

  • Computation of standard deviations, ysd, and prediction intervals, yint, is not supported when PredictMethod is 'bcd'.

Alternatives

To compute the predicted responses for new data, use predict.

References

[1] Harrison, D. and D.L., Rubinfeld. "Hedonic prices and the demand for clean air." J. Environ. Economics & Management. Vol.5, 1978, pp. 81-102.

[2] Lichman, M. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.

Introduced in R2015b