loss

Loss of linear incremental learning model on batch of data

Syntax

L = loss(Mdl,X,Y)

L = loss(Mdl,X,Y,Name,Value)

Description

loss returns the regression or classification loss of a configured incremental learning model for linear regression (incrementalRegressionLinear object) or linear binary classification (incrementalClassificationLinear object).

To measure model performance on a data stream and store the results in the output model, call updateMetrics or updateMetricsAndFit.

L = loss(Mdl,X,Y) returns the loss for the incremental learning model Mdl using the batch of predictor data X and corresponding responses Y.

example

L = loss(Mdl,X,Y,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, you can specify that the columns of the predictor data matrix correspond to observations, or specify the classification loss function.

example

Examples

collapse all

Measure Model Performance During Incremental Learning

Open Live Script

The performance of an incremental model on streaming data is measured in three ways:

Cumulative metrics measure the performance since the start of incremental learning.
Window metrics measure the performance on a specified window of observations. The metrics are updated every time the model processes the specified window.
The loss function measures the performance on a specified batch of data only.

Load the human activity data set. Randomly shuffle the data.

load humanactivity
n = numel(actid);
rng(1) % For reproducibility
idx = randsample(n,n);
X = feat(idx,:);
Y = actid(idx);

For details on the data set, enter Description at the command line.

Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is moving (actid > 2).

Y = Y > 2;

Create an incremental linear SVM model for binary classification. Configure the model for loss by specifying the class names, prior class distribution (uniform), and arbitrary coefficient and bias values. Specify a metrics window size of 1000 observations.

p = size(X,2);
Beta = randn(p,1);
Bias = randn(1);
Mdl = incrementalClassificationLinear('Beta',Beta,'Bias',Bias, ...
    'ClassNames',unique(Y),'Prior','uniform','MetricsWindowSize',1000);

Mdl is an incrementalClassificationLinear model. All its properties are read-only. Instead of specifying arbitrary values, you can take either of these actions to configure the model:

Train an SVM model using fitcsvm or fitclinear on a subset of the data (if available), and then convert the model to an incremental learner by using incrementalLearner.
Incrementally fit Mdl to data by using fit.

Simulate a data stream, and perform the following actions on each incoming chunk of 50 observations:

Call updateMetrics to measure the cumulative performance and the performance within a window of observations. Overwrite the previous incremental model with a new one to track performance metrics.
Call loss to measure the model performance on the incoming chunk.
Call fit to fit the model to the incoming chunk. Overwrite the previous incremental model with a new one fitted to the incoming observations.
Store all performance metrics to see how they evolve during incremental learning.

% Preallocation
numObsPerChunk = 50;
nchunk = floor(n/numObsPerChunk);
ce = array2table(zeros(nchunk,3),'VariableNames',["Cumulative" "Window" "Loss"]);

% Incremental learning
for j = 1:nchunk
    ibegin = min(n,numObsPerChunk*(j-1) + 1);
    iend   = min(n,numObsPerChunk*j);
    idx = ibegin:iend;    
    Mdl = updateMetrics(Mdl,X(idx,:),Y(idx));
    ce{j,["Cumulative" "Window"]} = Mdl.Metrics{"ClassificationError",:};
    ce{j,"Loss"} = loss(Mdl,X(idx,:),Y(idx));
    Mdl = fit(Mdl,X(idx,:),Y(idx));
end

Mdl is an incrementalClassificationLinear model object trained on all the data in the stream. During incremental learning and after the model is warmed up, updateMetrics checks the performance of the model on the incoming observations, then and the fit function fits the model to those observations. loss is agnostic of the metrics warm-up period, so it measures the classification error for all iterations.

To see how the performance metrics evolve during training, plot them.

figure
plot(ce.Variables)
xlim([0 nchunk])
ylim([0 0.05])
ylabel('Classification Error')
xline(Mdl.MetricsWarmupPeriod/numObsPerChunk,'r-.')
legend(ce.Properties.VariableNames)
xlabel('Iteration')

Figure contains an axes object. The axes object with xlabel Iteration, ylabel Classification Error contains 4 objects of type line, constantline. These objects represent Cumulative, Window, Loss.

The yellow line represents the classification error on each incoming chunk of data. After the metrics warm-up period, Mdl tracks the cumulative and window metrics. The cumulative and batch losses converge as the fit function fits the incremental model to the incoming data.

Compute Custom Loss on Incoming Chunks of Data

Open Live Script

Fit an incremental learning model for regression to streaming data, and compute the mean absolute deviation (MAD) on the incoming data batches.

Load the robot arm data set. Obtain the sample size n and the number of predictor variables p.

load robotarm
n = numel(ytrain);
p = size(Xtrain,2);

For details on the data set, enter Description at the command line.

Create an incremental linear model for regression. Configure the model as follows:

Specify a metrics warm-up period of 1000 observations.
Specify a metrics window size of 500 observations.
Track the mean absolute deviation (MAD) to measure the performance of the model. Create an anonymous function that measures the absolute error of each new observation. Create a structure array containing the name MeanAbsoluteError and its corresponding function.
Configure the model to predict responses by specifying that all regression coefficients and the bias are 0.

maefcn = @(z,zfit,w)(abs(z - zfit));
maemetric = struct("MeanAbsoluteError",maefcn);

Mdl = incrementalRegressionLinear('MetricsWarmupPeriod',1000,'MetricsWindowSize',500, ...
    'Metrics',maemetric,'Beta',zeros(p,1),'Bias',0,'EstimationPeriod',0)

Mdl = 
  incrementalRegressionLinear

               IsWarm: 0
              Metrics: [2×2 table]
    ResponseTransform: 'none'
                 Beta: [32×1 double]
                 Bias: 0
              Learner: 'svm'


  Properties, Methods

Mdl is an incrementalRegressionLinear model object configured for incremental learning.

Perform incremental learning. At each iteration:

Simulate a data stream by processing a chunk of 50 observations.
Call updateMetrics to compute cumulative and window metrics on the incoming chunk of data. Overwrite the previous incremental model with a new one fitted to overwrite the previous metrics.
Call loss to compute the MAD on the incoming chunk of data. Whereas the cumulative and window metrics require that custom losses return the loss for each observation, loss requires the loss on the entire chunk. Compute the mean of the absolute deviation.
Call fit to fit the incremental model to the incoming chunk of data.
Store the cumulative, window, and chunk metrics to see how they evolve during incremental learning.

% Preallocation
numObsPerChunk = 50;
nchunk = floor(n/numObsPerChunk);
mae = array2table(zeros(nchunk,3),'VariableNames',["Cumulative" "Window" "Chunk"]);

% Incremental fitting
for j = 1:nchunk
    ibegin = min(n,numObsPerChunk*(j-1) + 1);
    iend   = min(n,numObsPerChunk*j);
    idx = ibegin:iend;    
    Mdl = updateMetrics(Mdl,Xtrain(idx,:),ytrain(idx));
    mae{j,1:2} = Mdl.Metrics{"MeanAbsoluteError",:};
    mae{j,3} = loss(Mdl,Xtrain(idx,:),ytrain(idx),'LossFun',@(x,y,w)mean(maefcn(x,y,w)));
    Mdl = fit(Mdl,Xtrain(idx,:),ytrain(idx));
end

Mdl is an incrementalRegressionLinear model object trained on all the data in the stream. During incremental learning and after the model is warmed up, updateMetrics checks the performance of the model on the incoming observations, and the fit function fits the model to those observations.

Plot the performance metrics to see how they evolved during incremental learning.

figure
h = plot(mae.Variables);
xlim([0 nchunk])
ylabel('Mean Absolute Deviation')
xline(Mdl.MetricsWarmupPeriod/numObsPerChunk,'r-.')
xlabel('Iteration')
legend(h,mae.Properties.VariableNames)

Figure contains an axes object. The axes object with xlabel Iteration, ylabel Mean Absolute Deviation contains 4 objects of type line, constantline. These objects represent Cumulative, Window, Chunk.

The plot suggests the following:

updateMetrics computes the performance metrics after the metrics warm-up period only.
updateMetrics computes the cumulative metrics during each iteration.
updateMetrics computes the window metrics after processing 500 observations.
Because Mdl was configured to predict observations from the beginning of incremental learning, loss can compute the MAD on each incoming chunk of data.

Input Arguments

collapse all

`Mdl` — Incremental learning model
`incrementalClassificationLinear` model object | `incrementalRegressionLinear` model object

Incremental learning model, specified as an incrementalClassificationLinear or incrementalRegressionLinear model object. You can create Mdl directly or by converting a supported, traditionally trained machine learning model using the incrementalLearner function. For more details, see the corresponding reference page.

You must configure Mdl to compute its loss on a batch of observations.

If Mdl is a converted, traditionally trained model, you can compute its loss without any modifications.
Otherwise, Mdl must satisfy the following criteria, which you can specify directly or by fitting Mdl to data using fit or updateMetricsAndFit.
- If Mdl is an incrementalRegressionLinear model, its model coefficients Mdl.Beta and bias Mdl.Bias must be nonempty arrays.
- If Mdl is an incrementalClassificationLinear model, its model coefficients Mdl.Beta and bias Mdl.Bias must be nonempty arrays, the class names Mdl.ClassNames must contain two classes, and the prior class distribution Mdl.Prior must contain known values.
- Regardless of object type, if you configure the model so that functions standardize predictor data, the predictor means Mdl.Mu and standard deviations Mdl.Sigma must be nonempty arrays.

`X` — Batch of predictor data
floating-point matrix

Batch of predictor data with which to compute the loss, specified as a floating-point matrix of n observations and Mdl.NumPredictors predictor variables. The value of the ObservationsIn name-value argument determines the orientation of the variables and observations. The default ObservationsIn value is "rows", which indicates that observations in the predictor data are oriented along the rows of X.

The length of the observation labels Y and the number of observations in X must be equal; Y(j) is the label of observation j (row or column) in X.

Note

loss supports only floating-point input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.

Data Types: single | double

`Y` — Batch of responses (labels)
categorical array | character array | string array | logical vector | floating-point vector | cell array of character vectors

Batch of responses (labels) with which to compute the loss, specified as a categorical, character, or string array, logical or floating-point vector, or cell array of character vectors for classification problems; or a floating-point vector for regression problems.

The length of the observation labels Y and the number of observations in X must be equal; Y(j) is the label of observation j (row or column) in X.

For classification problems:

loss supports binary classification only.
If Y contains a label that is not a member of Mdl.ClassNames, loss issues an error.
The data type of Y and Mdl.ClassNames must be the same.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'ObservationsIn','columns','Weights',W specifies that the columns of the predictor matrix correspond to observations, and the vector W contains observation weights to apply.

`LossFun` — Loss function
string vector | function handle | cell vector | structure array

Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in loss function name or function handle.

Classification problems: The following table lists the available loss functions when Mdl is an incrementalClassificationLinear model. Specify one using its corresponding character vector or string scalar.

Name Description
"binodeviance" Binomial deviance
"classiferror" (default) Misclassification rate in decimal
"exponential" Exponential loss
"hinge" Hinge loss
"logit" Logistic loss
"quadratic" Quadratic loss
For more details, see Classification Loss.
Logistic regression learners return posterior probabilities as classification scores, but SVM learners do not (see predict).
To specify a custom loss function, use function handle notation. The function must have this form:
```
lossval = lossfcn(C,S,W)
```
- The output argument lossval is an n-by-1 floating-point vector, where lossval(j) is the classification loss of observation j.
- You specify the function name (lossfcn).
- C is an n-by-2 logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in the ClassNames property. Create C by setting C(p,q) = 1, if observation p is in class q, for each observation in the specified data. Set the other element in row p to 0.
- S is an n-by-2 numeric matrix of predicted classification scores. S is similar to the score output of predict, where rows correspond to observations in the data and the column order corresponds to the class order in the ClassNames property. S(p,q) is the classification score of observation p being classified in class q.
- W is an n-by-1 numeric vector of observation weights.
Regression problems: The following table lists the available loss functions when Mdl is an incrementalRegressionLinear model. Specify one using its corresponding character vector or string scalar.

Name Description Learner Supporting Metric
"epsiloninsensitive" Epsilon insensitive loss 'svm'
"mse" (default) Weighted mean squared error 'svm' and 'leastsquares'

For more details, see Regression Loss.
To specify a custom loss function, use function handle notation. The function must have this form:
```
lossval = lossfcn(Y,YFit,W)
```
- The output argument lossval is a floating-point scalar.
- You specify the function name (lossfcn).
- Y is a length n numeric vector of observed responses.
- YFit is a length n numeric vector of corresponding predicted responses.
- W is an n-by-1 numeric vector of observation weights.

Name	Description
`"binodeviance"`	Binomial deviance
`"classiferror"` (default)	Misclassification rate in decimal
`"exponential"`	Exponential loss
`"hinge"`	Hinge loss
`"logit"`	Logistic loss
`"quadratic"`	Quadratic loss

Name	Description	Learner Supporting Metric
`"epsiloninsensitive"`	Epsilon insensitive loss	`'svm'`
`"mse"` (default)	Weighted mean squared error	`'svm'` and `'leastsquares'`

Example: 'LossFun',"mse"

Example: 'LossFun',@lossfcn

Data Types: char | string | function_handle

`ObservationsIn` — Predictor data observation dimension
`'rows'` (default) | `'columns'`

Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'.

Data Types: char | string

`Weights` — Batch of observation weights
floating-point vector of positive values

Batch of observation weights, specified as the comma-separated pair consisting of 'Weights' and a floating-point vector of positive values. loss weighs the observations in the input data with the corresponding values in Weights. The size of Weights must equal n, which is the number of observations in the input data.

By default, Weights is ones(n,1).

For more details, see Observation Weights.

Data Types: double | single

Output Arguments

collapse all

`L` — Classification or regression loss
numeric scalar

Classification or regression loss, returned as a numeric scalar. The interpretation of L depends on Weights and LossFun.

More About

collapse all

Classification Loss

Classification loss functions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.

Consider the following scenario.

L is the weighted average classification loss.
n is the sample size.

y_j is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class (or the first or second class in the ClassNames property), respectively.
f(X_j) is the positive-class classification score for observation (row) j of the predictor data X.
m_j = y_jf(X_j) is the classification score for classifying observation j into the class corresponding to y_j. Positive values of m_j indicate correct classification and do not contribute much to the average loss. Negative values of m_j indicate incorrect classification and contribute significantly to the average loss.

The weight for observation j is w_j.

Given this scenario, the following table describes the supported loss functions that you can specify by using the LossFun name-value argument.

Loss Function	Value of `LossFun`	Equation
Binomial deviance	`"binodeviance"`	$L = \sum_{j = 1}^{n} w_{j} \log {1 + \exp [- 2 m_{j}]} .$
Exponential loss	`"exponential"`	$L = \sum_{j = 1}^{n} w_{j} \exp (- m_{j}) .$
Misclassification rate in decimal	`"classiferror"`	$L = \sum_{j = 1}^{n} w_{j} I {{\hat{y}}_{j} \neq y_{j}},$ where ${\hat{y}}_{j}$ is the class label corresponding to the class with the maximal score, and I{·} is the indicator function.
Hinge loss	`"hinge"`	$L = \sum_{j = 1}^{n} w_{j} \max {0, 1 - m_{j}} .$
Logistic loss	`"logit"`	$L = \sum_{j = 1}^{n} w_{j} \log (1 + \exp (- m_{j})) .$
Quadratic loss	`"quadratic"`	$L = \sum_{j = 1}^{n} w_{j} {(1 - m_{j})}^{2} .$

The loss function does not omit an observation with a NaN score when computing the weighted average loss. Therefore, loss can return NaN when the predictor data X contains missing values, and the name-value argument LossFun is not specified as "classiferror". In most cases, if the data set does not contain missing predictors, the loss function does not return NaN.

This figure compares the loss functions over the score m for one observation. Some functions are normalized to pass through the point (0,1).

Comparison of classification losses for different loss functions

Regression Loss

Regression loss functions measure the predictive inaccuracy of regression models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.

Consider the following scenario.

L is the weighted average classification loss.
n is the sample size.
y_j is the observed response of observation j.
f(X_j) is the predicted value of observation j of the predictor data X.
The weight for observation j is w_j.

Given this scenario, the following table describes the supported loss functions that you can specify by using the LossFun name-value argument.

Loss Function	Value of `LossFun`	Equation
Epsilon insensitive loss	`"epsiloninsensitive"`	$L = \max [0, \| y - f (x) \| - ε] .$
Mean squared error	`"mse"`	$L = {[y - f (x)]}^{2} .$

The loss function does not omit an observation with a NaN prediction when computing the weighted average loss. Therefore, loss can return NaN when the predictor data X contains missing values. In most cases, if the data set does not contain missing predictors, the loss function does not return NaN.

Algorithms

collapse all

Observation Weights

For classification problems, if the prior class probability distribution is known (in other words, the prior distribution is not empirical), loss normalizes observation weights to sum to the prior class probabilities in the respective classes. This action implies that observation weights are the respective prior class probabilities by default.

For regression problems or if the prior class probability distribution is empirical, the software normalizes the specified observation weights to sum to 1 each time you call loss.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Use saveLearnerForCoder, loadLearnerForCoder, and codegen (MATLAB Coder) to generate code for the loss function. Save a trained model by using saveLearnerForCoder. Define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the loss function. Then use codegen to generate code for the entry-point function.
To generate single-precision C/C++ code for loss, specify DataType="single" when you call the loadLearnerForCoder function.

This table contains notes about the arguments of loss. Arguments not included in this table are fully supported.

Argument	Notes and Limitations
`Mdl`	For usage notes and limitations of the model object, see `incrementalClassificationLinear` or `incrementalRegressionLinear`.
`X`	Batch-to-batch, the number of observations can be a variable size, but must equal the number of observations in `Y`. The number of predictor variables must equal `Mdl.NumPredictors`. `X` must be `single` or `double`.
`Y`	Batch-to-batch, the number of observations can be a variable size, but must equal the number of observations in `X`. For classification problems, all labels in `Y` must be represented in `Mdl.ClassNames`. `Y` and `Mdl.ClassNames` must have the same data type.
`LossFun`	The specified function cannot be an anonymous function.

If you configure Mdl to shuffle data (Mdl.Shuffle is true, or Mdl.Solver is "sgd" or "asgd"), the loss function randomly shuffles each incoming batch of observations before it fits the model to the batch. The order of the shuffled observations might not match the order generated by MATLAB^®. Therefore, if you fit Mdl before computing the loss, the loss computed in MATLAB might not be equal to the loss computed by the generated code.
Use a homogeneous data type (specifically, single or double) for all floating-point input arguments and object properties.

For more information, see Introduction to Code Generation for Statistics and Machine Learning Functions.

Version History

Introduced in R2020b

expand all

R2022a: `loss` can return `NaN` for predictor data with missing values

The loss function no longer omits an observation with a NaN prediction (score for classification and response for regression) when computing the weighted average loss. Therefore, loss can now return NaN when the predictor data X contains missing values, and the name-value argument LossFun is not specified as "classiferror" (for classification). In most cases, if the data set does not contain missing predictors, the loss function does not return NaN.

If loss in your code returns NaN, you can update your code to avoid this result. Remove or replace the missing values by using rmmissing or fillmissing, respectively.

loss

Syntax

Description

Examples

Measure Model Performance During Incremental Learning

Compute Custom Loss on Incoming Chunks of Data

Input Arguments

`Mdl` — Incremental learning model
`incrementalClassificationLinear` model object | `incrementalRegressionLinear` model object

`X` — Batch of predictor data
floating-point matrix

`Y` — Batch of responses (labels)
categorical array | character array | string array | logical vector | floating-point vector | cell array of character vectors

Name-Value Arguments

`LossFun` — Loss function
string vector | function handle | cell vector | structure array

`ObservationsIn` — Predictor data observation dimension
`'rows'` (default) | `'columns'`

`Weights` — Batch of observation weights
floating-point vector of positive values

Output Arguments

`L` — Classification or regression loss
numeric scalar

More About

Classification Loss

Regression Loss

Algorithms

Observation Weights

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Version History

R2022a: `loss` can return `NaN` for predictor data with missing values

See Also

Objects

Functions

Topics

loss

Syntax

Description

Examples

Measure Model Performance During Incremental Learning

Compute Custom Loss on Incoming Chunks of Data

Input Arguments

Mdl — Incremental learning model incrementalClassificationLinear model object | incrementalRegressionLinear model object

X — Batch of predictor data floating-point matrix

Y — Batch of responses (labels) categorical array | character array | string array | logical vector | floating-point vector | cell array of character vectors

Name-Value Arguments

LossFun — Loss function string vector | function handle | cell vector | structure array

ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns'

Weights — Batch of observation weights floating-point vector of positive values

Output Arguments

L — Classification or regression loss numeric scalar

More About

Classification Loss

Regression Loss

Algorithms

Observation Weights

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

Version History

R2022a: loss can return NaN for predictor data with missing values

See Also

Objects

Functions

Topics

`Mdl` — Incremental learning model
`incrementalClassificationLinear` model object | `incrementalRegressionLinear` model object

`X` — Batch of predictor data
floating-point matrix

`Y` — Batch of responses (labels)
categorical array | character array | string array | logical vector | floating-point vector | cell array of character vectors

`LossFun` — Loss function
string vector | function handle | cell vector | structure array

`ObservationsIn` — Predictor data observation dimension
`'rows'` (default) | `'columns'`

`Weights` — Batch of observation weights
floating-point vector of positive values

`L` — Classification or regression loss
numeric scalar

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

R2022a: `loss` can return `NaN` for predictor data with missing values