loss

Loss for quantile linear regression model

Since R2024b

collapse all in page

Syntax

L = loss(Mdl,Tbl,ResponseVarName)

L = loss(Mdl,Tbl,Y)

L = loss(Mdl,X,Y)

L = loss(___,Name=Value)

Description

L = loss(Mdl,Tbl,ResponseVarName) returns the quantile loss for the trained quantile linear regression model Mdl. The function uses the predictor data in the table Tbl and the response values in the ResponseVarName table variable. For more information, see Quantile Loss.

example

L = loss(Mdl,Tbl,Y) returns the quantile loss for the model Mdl using the predictor data in the table Tbl and the response values in the vector Y.

L = loss(Mdl,X,Y) returns the quantile loss for the model Mdl using the predictor data X and the corresponding response values in Y.

L = loss(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can specify the quantiles for which to return loss values.

example

Examples

collapse all

Compute Loss for Quantile Linear Regression Model

Open Live Script

Compute the quantile loss for a quantile linear regression model.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Cylinders, Displacement, and so on, as well as the response variable MPG. View the first eight observations.

load carbig
cars = table(Acceleration,Cylinders,Displacement, ...
    Horsepower,Model_Year,Origin,Weight,MPG);
head(cars)

    Acceleration    Cylinders    Displacement    Horsepower    Model_Year    Origin     Weight    MPG
    ____________    _________    ____________    __________    __________    _______    ______    ___

          12            8            307            130            70        USA         3504     18 
        11.5            8            350            165            70        USA         3693     15 
          11            8            318            150            70        USA         3436     18 
          12            8            304            150            70        USA         3433     16 
        10.5            8            302            140            70        USA         3449     17 
          10            8            429            198            70        USA         4341     15 
           9            8            454            220            70        USA         4354     14 
         8.5            8            440            215            70        USA         4312     14

Remove rows of cars where the table has missing values.

cars = rmmissing(cars);

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin));
cars.Origin = mergecats(cars.Origin,["France","Japan",...
    "Germany","Sweden","Italy","England"],"NotUSA");

Partition the data into training and test sets using cvpartition. Use approximately 80% of the observations as training data, and 20% of the observations as test data.

rng(0,"twister") % For reproducibility of the data partition
c = cvpartition(height(cars),"Holdout",0.20);

trainingIdx = training(c);
carsTrain = cars(trainingIdx,:);

testIdx = test(c);
carsTest = cars(testIdx,:);

Train a quantile linear regression model using the carsTrain training data. Specify MPG as the response variable. Then, compute the quantile loss using the carsTest test data.

Mdl = fitrqlinear(carsTrain,"MPG");
L = loss(Mdl,carsTest)

L = 
2.9448

Retrain the model with a beta tolerance of 1e-6 instead of the default value of 1e-4, and then compute the test set quantile loss.

newMdl = fitrqlinear(carsTrain,"MPG",BetaTolerance=1e-6);
newL = loss(newMdl,carsTest)

newL = 
1.4050

The retrained model has a lower quantile loss.

Compute Goodness of Fit Using Custom Loss

Open Live Script

Determine how well a linear quantile regression model fits the data for each quantile by using a quantile regression analog to the R-squared value.

Generate 500 observations from the model $y = x_{1} + 2 x_{2} + ϵ$ .

$X = [x_{1}, x_{2}]$ is a predictor matrix of standard normal elements.
$ϵ$ is an error vector of normal elements with mean 0 and standard deviation 0.3.
y is the response.

rng("default") % For reproducibility
n = 500;
X = randn(n,2);
y = X(:,1) + 2*X(:,2) + 0.3*randn(n,1);

Train a linear quantile regression model using the data in X and y. Specify to use the 0.25, 0.50, and 0.75 quantiles.

Mdl = fitrqlinear(X,y,Quantiles=[0.25 0.50 0.75]);

Generate 100 test set observations from the same model used to generate the training data.

newN = 100;
XTest = randn(newN,2);
yTest = XTest(:,1) + 2*XTest(:,2) + 0.3*randn(newN,1);

Compute the R-squared analog for the quantile regression model by using the test set.

First, create the custom gof function. The function accepts test set responses (ytest), test set predictions for a particular quantile (yfit), test observation weights (weights), the specified quantile (quantile), and the linear quantile regression model used to generate the test set predictions (model). The function uses these values to compute the quantile loss for the linear model (unrestrictedLoss) and the quantile loss for the linear model restricted to the intercept term (restrictedLoss). The function returns the value 1-(unrestrictedLoss/restrictedLoss), which is between 0 and 1. A value closer to 1 suggests a better model fit for the specified quantile.

function L = gof(ytest,yfit,weights,quantile,model)

% Compute quantile loss for unrestricted model
unrestrictedResiduals = ytest - yfit;
unrestrictedLoss = unrestrictedResiduals.* ...
    (quantile-(unrestrictedResiduals<0));
unrestrictedLoss = sum(weights.*unrestrictedLoss)/sum(weights);

% Compute quantile loss for restricted model
qIndex = model.Quantiles==quantile;
restrictedYFit = model.ModelParameters.InitialBias(qIndex);
restrictedResiduals = ytest - restrictedYFit;
restrictedLoss = restrictedResiduals.* ...
    (quantile-(restrictedResiduals<0));
restrictedLoss = sum(weights.*restrictedLoss)/sum(weights);

% Compute R^2 analog
L = 1 - (unrestrictedLoss/restrictedLoss);

end

Create a function handle for the gof function that includes the required model input argument. Then, use the LossFun name-value argument to pass the function handle to loss, along with the linear quantile regression model, the test predictor data, and the test response data.

customLoss = @(ytest,yfit,weights,quantile) ...
    gof(ytest,yfit,weights,quantile,Mdl);
L = loss(Mdl,XTest,yTest,LossFun=customLoss)

L = 1×3

    0.8724    0.8724    0.8731

For each quantile, the custom loss value is close to 1, which suggests that the quantile regression model provides a good fit to the data. For more information on this custom loss, see [1].

Input Arguments

collapse all

`Mdl` — Trained quantile linear regression model
`RegressionQuantileLinear` model object | `CompactRegressionQuantileLinear` model object

Trained quantile linear regression model, specified as a RegressionQuantileLinear or CompactRegressionQuantileLinear model object.

`Tbl` — Sample data
table

Sample data, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain additional columns for the response variable and the observation weights. Tbl must contain all of the predictors used to train Mdl. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

If Tbl contains the response variable used to train Mdl, then you do not need to specify ResponseVarName or Y.
If you trained Mdl using sample data contained in a table, then the input data for loss must also be in a table.
If you set Standardize to true in fitrqlinear when training Mdl, then the software standardizes the numeric columns of the predictor data using the corresponding means (Mdl.Mu) and standard deviations (Mdl.Sigma).

Data Types: table

`ResponseVarName` — Response variable name
name of variable in `Tbl`

Response variable name, specified as the name of a variable in Tbl. The response variable must be a numeric vector.

You must specify ResponseVarName as a character vector or cell array of character vectors. For example, if Tbl stores the response variable as Tbl.Y, then specify ResponseVarName as "Y". Otherwise, the software treats the Y column of Tbl as a predictor.

Data Types: char | string

`Y` — Response data
numeric vector

Response data, specified as a numeric vector. The length of Y must be equal to the number of observations in X or Tbl.

Data Types: single | double

`X` — Predictor data
numeric matrix

Predictor data, specified as a numeric matrix. By default, loss assumes that each row of X corresponds to one observation, and each column corresponds to one predictor variable.

X and Y must have the same number of observations.
If you set Standardize to true in fitrqlinear when training Mdl, then the software standardizes the numeric columns of the predictor data using the corresponding means (Mdl.Mu) and standard deviations (Mdl.Sigma).

Note

If you orient your predictor matrix so that observations correspond to columns and specify ObservationsIn="columns", then you might experience a significant reduction in computation time.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: loss(Mdl,Tbl,"Response",Quantiles=[0.25 0.5 0.75]) specifies to compute the quantile loss for the 0.25, 0.5, and 0.75 quantiles.

`Quantiles` — Quantiles for which to compute loss
`"all"` (default) | vector of values in `Mdl.Quantiles`

Quantiles for which to compute the loss, specified as a vector of values in Mdl.Quantiles. The function returns the loss for each quantile separately.

Example: Quantiles=[0.4 0.6]

Data Types: single | double | char | string

`LossFun` — Loss function
`"quantile"` (default) | function handle

Loss function, specified as "quantile" or a function handle.

"quantile" — Quantile loss. For more information, see Quantile Loss.
Function handle — To specify a custom loss function, use a function handle. The function must have this form:
```
lossval = lossfun(Y,YFit,W,q)
```
- The output argument lossval is a numeric scalar.
- You specify the function name (lossfun).
- Y is a length-n numeric vector of observed responses, where n is the number of observations in Tbl or X.
- YFit is a length-n numeric vector of corresponding predicted responses.
- W is an n-by-1 numeric vector of observation weights.
- q is a numeric scalar in the range [0,1] corresponding to a quantile.

Example: LossFun=@lossfun

Data Types: char | string | function_handle

`ObservationsIn` — Predictor data observation dimension
`"rows"` (default) | `"columns"`

Predictor data observation dimension, specified as "rows" or "columns".

Note

If you orient your predictor matrix so that observations correspond to columns and specify ObservationsIn="columns", then you might experience a significant reduction in computation time. You cannot specify ObservationsIn="columns" for predictor data in a table.

Example: ObservationsIn="columns"

Data Types: char | string

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"quantile"` (default) | `"omitted"` | numeric scalar | numeric vector

Since R2025a

Predicted response value to use for observations with missing predictor values, specified as "quantile", "omitted", a numeric scalar, or a numeric vector.

Value	Description
`"quantile"`	`loss` uses the specified quantile of the observed response values in the training data as the predicted response value for observations with missing predictor values.
`"omitted"`	`loss` excludes observations with missing predictor values from the loss computation.
Numeric scalar or vector	If `PredictionForMissingValue` is a scalar, then `loss` uses this value as the predicted response value for observations with missing predictor values. The function uses the same value for all quantiles. If `PredictionForMissingValue` is a vector, its length must be equal to the number of quantiles specified by the `Quantiles` name-value argument. `loss` uses element i in the vector as the quantile i predicted response value for observations with missing predictor values.

If an observation is missing an observed response value or an observation weight, then loss does not use the observation in the loss computation.

Example: PredictionForMissingValue="omitted"

Data Types: single | double | char | string

`Weights` — Observation weights
nonnegative numeric vector | name of variable in `Tbl`

Observation weights, specified as a nonnegative numeric vector or the name of a variable in Tbl. The software weights each observation in X or Tbl with the corresponding value in Weights. The length of Weights must equal the number of observations in X or Tbl.

If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as "W".

By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. If you supply weights, then loss computes the weighted loss and normalizes the weights to sum to 1.

Data Types: single | double | char | string

Output Arguments

collapse all

`L` — Loss
numeric vector

Loss, returned as a numeric vector. The type of loss depends on LossFun. Each element in L corresponds to a quantile in Quantiles.

Algorithms

collapse all

Quantile Loss

The quantile loss L for a specified quantile q (Quantiles) is $L = \frac{\sum_{i = 1}^{n} w_{i} \cdot r_{i} \cdot (q - I {r_{i} < 0})}{\sum_{i = 1}^{n} w_{i}}$ .

n is the number of observations in Tbl or X.
w_i is the observation weight for observation i (Weights).
r_i is the residual for observation i (that is, the difference between the true response value and the predicted response value).
I{·} is the indicator function.

References

[1] Koenker, Roger, and José A. F. Machado. “Goodness of Fit and Related Inference Processes for Quantile Regression.” Journal of the American Statistical Association 94, no. 448 (December 1999): 1296–1310. https://doi.org/10.1080/01621459.1999.10473882.

Version History

Introduced in R2024b

expand all

R2025a: Use a compact model and specify predictions for observations with missing predictor values

You can compute the loss for a compact quantile regression model (CompactRegressionQuantileLinear).

You can also specify the prediction values for observations with missing predictor values by using the PredictionForMissingValue name-value argument. In previous releases, observations with missing predictor values have NaN predictions. That is, the behavior is equivalent to PredictionForMissingValue=NaN.

loss

Syntax

Description

Examples

Compute Loss for Quantile Linear Regression Model

Compute Goodness of Fit Using Custom Loss

Input Arguments

Mdl — Trained quantile linear regression model RegressionQuantileLinear model object | CompactRegressionQuantileLinear model object

Tbl — Sample data table

ResponseVarName — Response variable name name of variable in Tbl

Y — Response data numeric vector

X — Predictor data numeric matrix

Name-Value Arguments

Quantiles — Quantiles for which to compute loss "all" (default) | vector of values in Mdl.Quantiles

LossFun — Loss function "quantile" (default) | function handle

ObservationsIn — Predictor data observation dimension "rows" (default) | "columns"

PredictionForMissingValue — Predicted response value to use for observations with missing predictor values "quantile" (default) | "omitted" | numeric scalar | numeric vector

Weights — Observation weights nonnegative numeric vector | name of variable in Tbl

Output Arguments

L — Loss numeric vector

Algorithms

Quantile Loss

References

Version History

R2025a: Use a compact model and specify predictions for observations with missing predictor values

See Also

`Mdl` — Trained quantile linear regression model
`RegressionQuantileLinear` model object | `CompactRegressionQuantileLinear` model object

`Tbl` — Sample data
table

`ResponseVarName` — Response variable name
name of variable in `Tbl`

`Y` — Response data
numeric vector

`X` — Predictor data
numeric matrix

`Quantiles` — Quantiles for which to compute loss
`"all"` (default) | vector of values in `Mdl.Quantiles`

`LossFun` — Loss function
`"quantile"` (default) | function handle

`ObservationsIn` — Predictor data observation dimension
`"rows"` (default) | `"columns"`

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"quantile"` (default) | `"omitted"` | numeric scalar | numeric vector

`Weights` — Observation weights
nonnegative numeric vector | name of variable in `Tbl`

`L` — Loss
numeric vector