loss

Regression error for Gaussian process regression model

Syntax

L = loss(gprMdl,Xnew,Ynew) L = loss(gprMdl,Xnew,Ynew,Name,Value)

Description

L = loss(gprMdl,Xnew,Ynew) returns the mean squared error for the Gaussian process regression (GPR) model gprMdl, using the predictors in Xnew and observed response in Ynew.

L = loss(gprMdl,Xnew,Ynew,Name,Value) returns the mean squared error for the GPR model, gprMdl, with additional options specified by one or more name-value arguments. For example, you can specify a custom loss function or the observation weights.

Input Arguments

expand all

`gprMdl` — Gaussian process regression model
`RegressionGP` object | `CompactRegressionGP` object

Gaussian process regression model, specified as a RegressionGP (full) or CompactRegressionGP (compact) object.

`Xnew` — New observed data
`table` | m-by-d matrix

New data, specified as a table or an n-by-d matrix, where m is the number of observations, and d is the number of predictor variables in the training data.

If you trained gprMdl on a table, then Xnew must be a table that contains all the predictor variables used to train gprMdl.

If Xnew is a table, then it can also contain Ynew. And if it does, then you do not have to specify Ynew.

If you trained gprMdl on a matrix, then Xnew must be a numeric matrix with d columns, and can only contain values for the predictor variables.

Data Types: single | double | table

`Ynew` — New response values
n-by-1 vector

New observed response values, that correspond to the predictor values in Xnew, specified as an n-by-1 vector. n is the number of rows in Xnew. Each entry in Ynew is the observed response based on the predictor data in the corresponding row of Xnew.

If Xnew is a table containing new response values, you do not have to specify Ynew.

Data Types: single | double

Name-Value Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

`lossfun` — Loss function
`'mse'` (default) | function handle

Loss function, specified as 'mse' (mean squared error) or a function handle.

If you pass a function handle, say fun, loss calls it as shown below: fun(Y,Ypred,W), where Y, Ypred and W are numeric vectors of length n, and n is the number of rows in Xnew. Y is the observed response, Ypred is the predicted response, and W is the observation weights.

Example: 'lossfun',Fct calls the loss function Fct.

Data Types: char | string | function_handle

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | `"omitted"` | numeric scalar

Since R2023b

Predicted response value to use for observations with missing predictor values, specified as "median", "mean", "omitted", or a numeric scalar.

Value	Description
`"median"`	`loss` uses the median of the observed response values in the training data as the predicted response value for observations with missing predictor values.
`"mean"`	`loss` uses the mean of the observed response values in the training data as the predicted response value for observations with missing predictor values.
`"omitted"`	`loss` excludes observations with missing predictor values from the loss computation.
Numeric scalar	`loss` uses this value as the predicted response value for observations with missing predictor values.

If an observation is missing an observed response value or an observation weight, then loss does not use the observation in the loss computation.

Example: "PredictionForMissingValue","omitted"

Data Types: single | double | char | string

`weights` — Observation weights
vector of 1s (default) | n-by-1 vector

Observation weights, specified as n-by-1 vector, where n is the number of rows in Xnew. By default, the weight of each observation is 1.

Example: 'weights',W uses the observation weights in vector W.

Data Types: double | single

Output Arguments

expand all

`L` — Regression error
scalar value

Regression error for the trained Gaussian process regression model, gprMdl, returned as a scalar value.

Examples

expand all

Compute Regression Loss for Test Data

Open Live Script

Load the sample data.

load('gprdata.mat')

The data has 8 predictor variables and contains 500 observations in training data and 100 observations in test data. This is simulated data.

Fit a GPR model using the squared exponential kernel function with separate length scales for each predictor. Standardize the predictor values in the training data. Use the exact method for fitting and prediction.

gprMdl = fitrgp(Xtrain,ytrain,'FitMethod','exact',...
'PredictMethod','exact','KernelFunction','ardsquaredexponential',...
'Standardize',1);

Compute the regression error for the test data.

L = loss(gprMdl,Xtest,ytest)

L = 0.6928

Predict the responses for test data.

ypredtest = predict(gprMdl,Xtest);

Plot the test response along with the predictions.

figure;
plot(ytest,'r');
hold on;
plot(ypredtest,'b');
legend('Data','Predictions','Location','Best');

Manually compute the regression loss.

L = (ytest - ypredtest)'*(ytest - ypredtest)/length(ytest)

L = 0.6928

Specify Custom Loss Function

Open Live Script

Load the sample data and store in a table.

load fisheriris
tbl = table(meas(:,1),meas(:,2),meas(:,3),meas(:,4),species,...
'VariableNames',{'meas1','meas2','meas3','meas4','species'});

Fit a GPR model using the first measurement as the response and the other variables as the predictors.

mdl = fitrgp(tbl,'meas1');

Predict the responses using the trained model.

ypred = predict(mdl,tbl);

Compute the mean absolute error.

n = height(tbl);
y = tbl.meas1;
fun = @(y,ypred,w) sum(abs(y-ypred))/n;
L = loss(mdl,tbl,'lossfun',fun)

L = 0.2345

Alternatives

You can use resubLoss to compute the regression error for the trained GPR model at the observations in the training data.

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

This function fully supports tall arrays. For more information, see Tall Arrays.

Version History

Introduced in R2015b

expand all

R2023b: Specify predicted response value to use for observations with missing predictor values

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support the PredictionForMissingValue name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.

Model Type	Model Objects	Object Functions
Gaussian process regression (GPR) model	`RegressionGP`, `CompactRegressionGP`	`loss`, `predict`, `resubLoss`, `resubPredict`
Gaussian process regression (GPR) model	`RegressionPartitionedGP`	`kfoldLoss`, `kfoldPredict`
Gaussian kernel regression model	`RegressionKernel`	`loss`, `predict`
Gaussian kernel regression model	`RegressionPartitionedKernel`	`kfoldLoss`, `kfoldPredict`
Linear regression model	`RegressionLinear`	`loss`, `predict`
Linear regression model	`RegressionPartitionedLinear`	`kfoldLoss`, `kfoldPredict`
Neural network regression model	`RegressionNeuralNetwork`, `CompactRegressionNeuralNetwork`	`loss`, `predict`, `resubLoss`, `resubPredict`
Neural network regression model	`RegressionPartitionedNeuralNetwork`	`kfoldLoss`, `kfoldPredict`
Support vector machine (SVM) regression model	`RegressionSVM`, `CompactRegressionSVM`	`loss`, `predict`, `resubLoss`, `resubPredict`
Support vector machine (SVM) regression model	`RegressionPartitionedSVM`	`kfoldLoss`, `kfoldPredict`

In previous releases, the regression model loss and predict functions listed above used NaN predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.

R2022a: `loss` can return NaN for predictor data with missing values

The loss function no longer omits an observation with a NaN prediction when computing the weighted average regression loss. Therefore, loss can now return NaN when the predictor data Xnew contains any missing values. In most cases, if the test set observations do not contain missing predictors, the loss function does not return NaN.

This change improves the automatic selection of a regression model when you use fitrauto. Before this change, the software might select a model (expected to best predict the responses for new data) with few non-NaN predictors.

If loss in your code returns NaN, you can update your code to avoid this result. Remove or replace the missing values by using rmmissing or fillmissing, respectively.

The following table shows the regression models for which the loss object function might return NaN. For more details, see the Compatibility Considerations for each loss function.

Model Type	Full or Compact Model Object	`loss` Object Function
Gaussian process regression (GPR) model	`RegressionGP`, `CompactRegressionGP`	`loss`
Gaussian kernel regression model	`RegressionKernel`	`loss`
Linear regression model	`RegressionLinear`	`loss`
Neural network regression model	`RegressionNeuralNetwork`, `CompactRegressionNeuralNetwork`	`loss`
Support vector machine (SVM) regression model	`RegressionSVM`, `CompactRegressionSVM`	`loss`

loss

Syntax

Description

Input Arguments

gprMdl — Gaussian process regression model RegressionGP object | CompactRegressionGP object

Xnew — New observed data table | m-by-d matrix

Ynew — New response values n-by-1 vector

Name-Value Arguments

lossfun — Loss function 'mse' (default) | function handle

PredictionForMissingValue — Predicted response value to use for observations with missing predictor values "median" (default) | "mean" | "omitted" | numeric scalar

weights — Observation weights vector of 1s (default) | n-by-1 vector

Output Arguments

L — Regression error scalar value

Examples

Compute Regression Loss for Test Data

Specify Custom Loss Function

Alternatives

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

Version History

R2023b: Specify predicted response value to use for observations with missing predictor values

R2022a: loss can return NaN for predictor data with missing values

See Also

`gprMdl` — Gaussian process regression model
`RegressionGP` object | `CompactRegressionGP` object

`Xnew` — New observed data
`table` | m-by-d matrix

`Ynew` — New response values
n-by-1 vector

`lossfun` — Loss function
`'mse'` (default) | function handle

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | `"omitted"` | numeric scalar

`weights` — Observation weights
vector of 1s (default) | n-by-1 vector

`L` — Regression error
scalar value

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

R2022a: `loss` can return NaN for predictor data with missing values