loss
Regression loss for Gaussian kernel regression model
Description
L = loss(Mdl,Tbl,ResponseVarName)Mdl using the predictor data
                    in Tbl and the true responses in
                        Tbl.ResponseVarName.
L = loss(___,Name,Value)loss returns the weighted regression loss using the
                    specified loss function.
Examples
Train a Gaussian kernel regression model for a tall array, then calculate the resubstitution mean squared error and epsilon-insensitive error.
When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function.
mapreducer(0)
Create a datastore that references the folder location with the data. The data can be contained in a single file, a collection of files, or an entire folder. Treat 'NA' values as missing data so that datastore replaces them with NaN values. Select a subset of the variables to use. Create a tall table on top of the datastore.
varnames = {'ArrTime','DepTime','ActualElapsedTime'};
ds = datastore('airlinesmall.csv','TreatAsMissing','NA',...
    'SelectedVariableNames',varnames);
t = tall(ds);Specify DepTime and ArrTime as the predictor variables (X) and ActualElapsedTime as the response variable (Y). Select the observations for which ArrTime is later than DepTime.
daytime = t.ArrTime>t.DepTime; Y = t.ActualElapsedTime(daytime); % Response data X = t{daytime,{'DepTime' 'ArrTime'}}; % Predictor data
Standardize the predictor variables.
Z = zscore(X); % Standardize the dataTrain a default Gaussian kernel regression model with the standardized predictors. Set 'Verbose',0 to suppress diagnostic messages.
[Mdl,FitInfo] = fitrkernel(Z,Y,'Verbose',0)Mdl = 
  RegressionKernel
              ResponseName: 'Y'
                   Learner: 'svm'
    NumExpansionDimensions: 64
               KernelScale: 1
                    Lambda: 8.5385e-06
             BoxConstraint: 1
                   Epsilon: 5.9303
  Properties, Methods
FitInfo = struct with fields:
                  Solver: 'LBFGS-tall'
            LossFunction: 'epsiloninsensitive'
                  Lambda: 8.5385e-06
           BetaTolerance: 1.0000e-03
       GradientTolerance: 1.0000e-05
          ObjectiveValue: 26.1409
       GradientMagnitude: 0.0023
    RelativeChangeInBeta: 0.0150
                 FitTime: 12.3091
                 History: []
Mdl is a trained RegressionKernel model, and the structure array FitInfo contains optimization details.
Determine how well the trained model generalizes to new predictor values by estimating the resubstitution mean squared error and epsilon-insensitive error.
lossMSE = loss(Mdl,Z,Y) % Resubstitution mean squared errorlossMSE =
  M×N×... tall array
    ?    ?    ?    ...
    ?    ?    ?    ...
    ?    ?    ?    ...
    :    :    :
    :    :    :
Preview deferred. Learn more.
lossEI = loss(Mdl,Z,Y,'LossFun','epsiloninsensitive') % Resubstitution epsilon-insensitive error
lossEI =
  M×N×... tall array
    ?    ?    ?    ...
    ?    ?    ?    ...
    ?    ?    ?    ...
    :    :    :
    :    :    :
Preview deferred. Learn more.
Evaluate the tall arrays and bring the results into memory by using gather.
[lossMSE,lossEI] = gather(lossMSE,lossEI)
Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.55 sec Evaluation completed in 0.74 sec
lossMSE = 2.5141e+03
lossEI = 25.5148
Specify a custom regression loss (Huber loss) for a Gaussian kernel regression model.
Load the carbig data set.
load carbigSpecify the predictor variables (X) and the response variable (Y).
X = [Weight,Cylinders,Horsepower,Model_Year]; Y = MPG;
Delete rows of X and Y where either array has NaN values. Removing rows with NaN values before passing data to fitrkernel can speed up training and reduce memory usage.
R = rmmissing([X Y]); X = R(:,1:4); Y = R(:,end);
Reserve 10% of the observations as a holdout sample. Extract the training and test indices from the partition definition.
rng(10) % For reproducibility N = length(Y); cvp = cvpartition(N,'Holdout',0.1); idxTrn = training(cvp); % Training set indices idxTest = test(cvp); % Test set indices
Train the regression kernel model. Standardize the training data.
Xtrain = X(idxTrn,:);
Ytrain = Y(idxTrn);
Mdl = fitrkernel(Xtrain,Ytrain,'Standardize',true)Mdl = 
  RegressionKernel
              ResponseName: 'Y'
                   Learner: 'svm'
    NumExpansionDimensions: 128
               KernelScale: 1
                    Lambda: 0.0028
             BoxConstraint: 1
                   Epsilon: 0.8617
  Properties, Methods
Mdl is a RegressionKernel model.
Create an anonymous function that measures Huber loss , that is,
where
 is the residual for observation j. Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the 'LossFun' name-value argument.  
huberloss = @(Y,Yhat,W)sum(W.*((0.5*(abs(Y-Yhat)<=1).*(Y-Yhat).^2) + ...
    ((abs(Y-Yhat)>1).*abs(Y-Yhat)-0.5)))/sum(W);Estimate the training set regression loss using the Huber loss function.
eTrain = loss(Mdl,Xtrain,Ytrain,'LossFun',huberloss)eTrain = 1.7210
Estimate the test set regression loss using the Huber loss function.
Xtest = X(idxTest,:);
Ytest = Y(idxTest);
eTest = loss(Mdl,Xtest,Ytest,'LossFun',huberloss)eTest = 1.3062
Input Arguments
Kernel regression model, specified as a RegressionKernel model object. You can create a
                RegressionKernel model object using fitrkernel.
Predictor data, specified as an
                            n-by-p numeric matrix, where
                            n is the number of observations and
                            p is the number of predictors. p
                        must be equal to the number of predictors used to train
                            Mdl.
Data Types: single | double
Sample data used to train the model, specified as a table. Each row of
                Tbl corresponds to one observation, and each column corresponds
            to one predictor variable. Optionally, Tbl can contain additional
            columns for the response variable and observation weights. Tbl must
            contain all the predictors used to train Mdl. Multicolumn variables
            and cell arrays other than cell arrays of character vectors are not allowed.
 If Tbl contains the response variable used to train Mdl, then you do not need to specify ResponseVarName or Y.
If you train Mdl using sample data contained in a table, then the input
            data for loss must also be in a table.
Response variable name, specified as the name of a variable in
                            Tbl. The response variable must be a numeric
                        vector. If Tbl contains the response variable used to
                        train Mdl, then you do not need to specify
                            ResponseVarName.
If you specify ResponseVarName, then you must specify
                        it as a character vector or string scalar. For example, if the response
                        variable is stored as Tbl.Y, then specify
                            ResponseVarName as 'Y'.
                        Otherwise, the software treats all columns of Tbl,
                        including Tbl.Y, as predictors.
Data Types: char | string
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
      Before R2021a, use commas to separate each name and value, and enclose 
      Name in quotes.
    
Example: L =
                    loss(Mdl,X,Y,'LossFun','epsiloninsensitive','Weights',weights) returns
                the weighted regression loss using the epsilon-insensitive loss
                function.
Loss function, specified as the comma-separated pair consisting of
                                'LossFun' and a built-in loss function name or a
                            function handle.
- The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar. Also, in the table, - x is an observation (row vector) from p predictor variables. 
- is a transformation of an observation (row vector) for feature expansion. T(x) maps x in to a high-dimensional space (). 
- β is a vector of m coefficients. 
- b is the scalar bias. 
 - Value - Description - 'epsiloninsensitive'- Epsilon-insensitive loss: - 'mse'- MSE: - 'epsiloninsensitive'is appropriate for SVM learners only.
- Specify your own function by using function handle notation. - Let - nbe the number of observations in- X. Your function must have this signature:- lossvalue =- lossfun(Y,Yhat,W)- The output argument - lossvalueis a scalar.
- You choose the function name ( - lossfun).
- Yis an n-dimensional vector of observed responses.- losspasses the input argument- Yin for- Y.
- Yhatis an n-dimensional vector of predicted responses, which is similar to the output of- predict.
- Wis an- n-by-1 numeric vector of observation weights.
 - Specify your function using - 'LossFun',@.- lossfun
Data Types: char | string | function_handle
Since R2023b
Predicted response value to use for observations with missing predictor values,
            specified as "median", "mean",
                "omitted", or a numeric scalar.
| Value | Description | 
|---|---|
| "median" | lossuses the median of the observed
                            response values in the training data as the predicted response value for
                            observations with missing predictor values. | 
| "mean" | lossuses the mean of the observed
                            response values in the training data as the predicted response value for
                            observations with missing predictor values. | 
| "omitted" | lossexcludes observations with missing
                            predictor values from the loss computation. | 
| Numeric scalar | lossuses this value as the predicted
                            response value for observations with missing predictor values. | 
If an observation is missing an observed response value or an observation weight, then
                loss does not use the observation in the loss
            computation.
Example: PredictionForMissingValue="omitted"
Data Types: single | double | char | string
Observation weights, specified as the comma-separated pair consisting
                            of 'Weights' and a numeric vector or the name of a
                            variable in Tbl.
- If - Weightsis a numeric vector, then the size of- Weightsmust be equal to the number of rows in- Xor- Tbl.
- If - Weightsis the name of a variable in- Tbl, you must specify- Weightsas a character vector or string scalar. For example, if the weights are stored as- Tbl.W, then specify- Weightsas- 'W'. Otherwise, the software treats all columns of- Tbl, including- Tbl.W, as predictors.
If you supply the observation weights, loss
                            computes the weighted regression loss, that is, the Weighted Mean Squared Error or
                                Epsilon-Insensitive Loss Function.
loss normalizes Weights to
                            sum to 1.
Data Types: double | single | char | string
Output Arguments
More About
The weighted mean squared error is calculated as follows:
where:
- n is the number of observations. 
- xj is the jth observation (row of predictor data). 
- yj is the observed response to xj. 
- f(xj) is the response prediction of the Gaussian kernel regression model - Mdlto xj.
- w is the vector of observation weights. 
Each observation weight in w is equal to
                        ones(n,1)/n by
                default. You can specify different values for the observation weights by using the
                    'Weights' name-value pair argument.
                    loss normalizes Weights to sum to
                1.
The epsilon-insensitive loss function ignores errors that are within the distance epsilon (ε) of the function value. The function is formally described as:
The mean epsilon-insensitive loss is calculated as follows:
where:
- n is the number of observations. 
- xj is the jth observation (row of predictor data). 
- yj is the observed response to xj. 
- f(xj) is the response prediction of the Gaussian kernel regression model - Mdlto xj.
- w is the vector of observation weights. 
Each observation weight in w is equal to
                        ones(n,1)/n by
                default. You can specify different values for the observation weights by using the
                    'Weights' name-value pair argument.
                    loss normalizes Weights to sum to
                1.
Extended Capabilities
The
        loss function supports tall arrays with the following usage
    notes and limitations:
- lossdoes not support tall- tabledata.
For more information, see Tall Arrays.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2018aloss fully supports GPU arrays.
Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.
This table lists the object functions that support the
            PredictionForMissingValue name-value argument. By default, the
        functions use the training set median as the predicted response value for observations with
        missing predictor values.
| Model Type | Model Objects | Object Functions | 
|---|---|---|
| Gaussian process regression (GPR) model | RegressionGP,CompactRegressionGP | loss,predict,resubLoss,resubPredict | 
| RegressionPartitionedGP | kfoldLoss,kfoldPredict | |
| Gaussian kernel regression model | RegressionKernel | loss,predict | 
| RegressionPartitionedKernel | kfoldLoss,kfoldPredict | |
| Linear regression model | RegressionLinear | loss,predict | 
| RegressionPartitionedLinear | kfoldLoss,kfoldPredict | |
| Neural network regression model | RegressionNeuralNetwork,CompactRegressionNeuralNetwork | loss,predict,resubLoss,resubPredict | 
| RegressionPartitionedNeuralNetwork | kfoldLoss,kfoldPredict | |
| Support vector machine (SVM) regression model | RegressionSVM,CompactRegressionSVM | loss,predict,resubLoss,resubPredict | 
| RegressionPartitionedSVM | kfoldLoss,kfoldPredict | 
In previous releases, the regression model loss and predict functions listed above used NaN predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.
The loss function no longer omits an observation with a
            NaN prediction when computing the weighted average regression loss. Therefore,
                loss can now return NaN when the predictor data
                X or the predictor variables in Tbl
            contain any missing values. In most cases, if the test set observations do not contain
            missing predictors, the loss function does not return
            NaN.
This change improves the automatic selection of a regression model when you use
                fitrauto.
            Before this change, the software might select a model (expected to best predict the
            responses for new data) with few non-NaN predictors.
If loss in your code returns NaN, you can update your code
            to avoid this result. Remove or replace the missing values by using rmmissing or fillmissing, respectively.
The following table shows the regression models for which the
                loss object function might return NaN. For more details,
            see the Compatibility Considerations for each loss
            function.
| Model Type | Full or Compact Model Object | lossObject Function | 
|---|---|---|
| Gaussian process regression (GPR) model | RegressionGP,CompactRegressionGP | loss | 
| Gaussian kernel regression model | RegressionKernel | loss | 
| Linear regression model | RegressionLinear | loss | 
| Neural network regression model | RegressionNeuralNetwork,CompactRegressionNeuralNetwork | loss | 
| Support vector machine (SVM) regression model | RegressionSVM,CompactRegressionSVM | loss | 
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleziona un sito web
Seleziona un sito web per visualizzare contenuto tradotto dove disponibile e vedere eventi e offerte locali. In base alla tua area geografica, ti consigliamo di selezionare: .
Puoi anche selezionare un sito web dal seguente elenco:
Come ottenere le migliori prestazioni del sito
Per ottenere le migliori prestazioni del sito, seleziona il sito cinese (in cinese o in inglese). I siti MathWorks per gli altri paesi non sono ottimizzati per essere visitati dalla tua area geografica.
Americhe
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)