Main Content

loss

Loss for quantile linear regression model

Since R2024b

    Description

    L = loss(Mdl,Tbl,ResponseVarName) returns the quantile loss for the trained quantile linear regression model Mdl. The function uses the predictor data in the table Tbl and the response values in the ResponseVarName table variable. For more information, see Quantile Loss.

    example

    L = loss(Mdl,Tbl,Y) returns the quantile loss for the model Mdl using the predictor data in the table Tbl and the response values in the vector Y.

    L = loss(Mdl,X,Y) returns the quantile loss for the model Mdl using the predictor data X and the corresponding response values in Y.

    L = loss(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can specify the quantiles for which to return loss values.

    example

    Examples

    collapse all

    Compute the quantile loss for a quantile linear regression model.

    Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Cylinders, Displacement, and so on, as well as the response variable MPG. View the first eight observations.

    load carbig
    cars = table(Acceleration,Cylinders,Displacement, ...
        Horsepower,Model_Year,Origin,Weight,MPG);
    head(cars)
        Acceleration    Cylinders    Displacement    Horsepower    Model_Year    Origin     Weight    MPG
        ____________    _________    ____________    __________    __________    _______    ______    ___
    
              12            8            307            130            70        USA         3504     18 
            11.5            8            350            165            70        USA         3693     15 
              11            8            318            150            70        USA         3436     18 
              12            8            304            150            70        USA         3433     16 
            10.5            8            302            140            70        USA         3449     17 
              10            8            429            198            70        USA         4341     15 
               9            8            454            220            70        USA         4354     14 
             8.5            8            440            215            70        USA         4312     14 
    

    Remove rows of cars where the table has missing values.

    cars = rmmissing(cars);

    Categorize the cars based on whether they were made in the USA.

    cars.Origin = categorical(cellstr(cars.Origin));
    cars.Origin = mergecats(cars.Origin,["France","Japan",...
        "Germany","Sweden","Italy","England"],"NotUSA");

    Partition the data into training and test sets using cvpartition. Use approximately 80% of the observations as training data, and 20% of the observations as test data.

    rng(0,"twister") % For reproducibility of the data partition
    c = cvpartition(height(cars),"Holdout",0.20);
    
    trainingIdx = training(c);
    carsTrain = cars(trainingIdx,:);
    
    testIdx = test(c);
    carsTest = cars(testIdx,:);

    Train a quantile linear regression model using the carsTrain training data. Specify MPG as the response variable. Then, compute the quantile loss using the carsTest test data.

    Mdl = fitrqlinear(carsTrain,"MPG");
    L = loss(Mdl,carsTest)
    L = 
    2.9448
    

    Retrain the model with a beta tolerance of 1e-6 instead of the default value of 1e-4, and then compute the test set quantile loss.

    newMdl = fitrqlinear(carsTrain,"MPG",BetaTolerance=1e-6);
    newL = loss(newMdl,carsTest)
    newL = 
    1.4050
    

    The retrained model has a lower quantile loss.

    Determine how well a linear quantile regression model fits the data for each quantile by using a quantile regression analog to the R-squared value.

    Generate 500 observations from the model y=x1+2x2+ϵ.

    • X=[x1,x2] is a predictor matrix of standard normal elements.

    • ϵ is an error vector of normal elements with mean 0 and standard deviation 0.3.

    • y is the response.

    rng("default") % For reproducibility
    n = 500;
    X = randn(n,2);
    y = X(:,1) + 2*X(:,2) + 0.3*randn(n,1);

    Train a linear quantile regression model using the data in X and y. Specify to use the 0.25, 0.50, and 0.75 quantiles.

    Mdl = fitrqlinear(X,y,Quantiles=[0.25 0.50 0.75]);

    Generate 100 test set observations from the same model used to generate the training data.

    newN = 100;
    XTest = randn(newN,2);
    yTest = XTest(:,1) + 2*XTest(:,2) + 0.3*randn(newN,1);

    Compute the R-squared analog for the quantile regression model by using the test set.

    First, create the custom gof function. The function accepts test set responses (ytest), test set predictions for a particular quantile (yfit), test observation weights (weights), the specified quantile (quantile), and the linear quantile regression model used to generate the test set predictions (model). The function uses these values to compute the quantile loss for the linear model (unrestrictedLoss) and the quantile loss for the linear model restricted to the intercept term (restrictedLoss). The function returns the value 1-(unrestrictedLoss/restrictedLoss), which is between 0 and 1. A value closer to 1 suggests a better model fit for the specified quantile.

    function L = gof(ytest,yfit,weights,quantile,model)
    
    % Compute quantile loss for unrestricted model
    unrestrictedResiduals = ytest - yfit;
    unrestrictedLoss = unrestrictedResiduals.* ...
        (quantile-(unrestrictedResiduals<0));
    unrestrictedLoss = sum(weights.*unrestrictedLoss)/sum(weights);
    
    % Compute quantile loss for restricted model
    qIndex = model.Quantiles==quantile;
    restrictedYFit = model.ModelParameters.InitialBias(qIndex);
    restrictedResiduals = ytest - restrictedYFit;
    restrictedLoss = restrictedResiduals.* ...
        (quantile-(restrictedResiduals<0));
    restrictedLoss = sum(weights.*restrictedLoss)/sum(weights);
    
    % Compute R^2 analog
    L = 1 - (unrestrictedLoss/restrictedLoss);
    
    end

    Create a function handle for the gof function that includes the required model input argument. Then, use the LossFun name-value argument to pass the function handle to loss, along with the linear quantile regression model, the test predictor data, and the test response data.

    customLoss = @(ytest,yfit,weights,quantile) ...
        gof(ytest,yfit,weights,quantile,Mdl);
    L = loss(Mdl,XTest,yTest,LossFun=customLoss)
    L = 1×3
    
        0.8724    0.8724    0.8731
    
    

    For each quantile, the custom loss value is close to 1, which suggests that the quantile regression model provides a good fit to the data. For more information on this custom loss, see [1].

    Input Arguments

    collapse all

    Trained quantile linear regression model, specified as a RegressionQuantileLinear or CompactRegressionQuantileLinear model object.

    Sample data, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain additional columns for the response variable and the observation weights. Tbl must contain all of the predictors used to train Mdl. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

    • If Tbl contains the response variable used to train Mdl, then you do not need to specify ResponseVarName or Y.

    • If you trained Mdl using sample data contained in a table, then the input data for loss must also be in a table.

    • If you set Standardize to true in fitrqlinear when training Mdl, then the software standardizes the numeric columns of the predictor data using the corresponding means (Mdl.Mu) and standard deviations (Mdl.Sigma).

    Data Types: table

    Response variable name, specified as the name of a variable in Tbl. The response variable must be a numeric vector.

    You must specify ResponseVarName as a character vector or cell array of character vectors. For example, if Tbl stores the response variable as Tbl.Y, then specify ResponseVarName as "Y". Otherwise, the software treats the Y column of Tbl as a predictor.

    Data Types: char | string

    Response data, specified as a numeric vector. The length of Y must be equal to the number of observations in X or Tbl.

    Data Types: single | double

    Predictor data, specified as a numeric matrix. By default, loss assumes that each row of X corresponds to one observation, and each column corresponds to one predictor variable.

    • X and Y must have the same number of observations.

    • If you set Standardize to true in fitrqlinear when training Mdl, then the software standardizes the numeric columns of the predictor data using the corresponding means (Mdl.Mu) and standard deviations (Mdl.Sigma).

    Note

    If you orient your predictor matrix so that observations correspond to columns and specify ObservationsIn="columns", then you might experience a significant reduction in computation time.

    Data Types: single | double

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: loss(Mdl,Tbl,"Response",Quantiles=[0.25 0.5 0.75]) specifies to compute the quantile loss for the 0.25, 0.5, and 0.75 quantiles.

    Quantiles for which to compute the loss, specified as a vector of values in Mdl.Quantiles. The function returns the loss for each quantile separately.

    Example: Quantiles=[0.4 0.6]

    Data Types: single | double | char | string

    Loss function, specified as "quantile" or a function handle.

    • "quantile" — Quantile loss. For more information, see Quantile Loss.

    • Function handle — To specify a custom loss function, use a function handle. The function must have this form:

      lossval = lossfun(Y,YFit,W,q)

      • The output argument lossval is a numeric scalar.

      • You specify the function name (lossfun).

      • Y is a length-n numeric vector of observed responses, where n is the number of observations in Tbl or X.

      • YFit is a length-n numeric vector of corresponding predicted responses.

      • W is an n-by-1 numeric vector of observation weights.

      • q is a numeric scalar in the range [0,1] corresponding to a quantile.

    Example: LossFun=@lossfun

    Data Types: char | string | function_handle

    Predictor data observation dimension, specified as "rows" or "columns".

    Note

    If you orient your predictor matrix so that observations correspond to columns and specify ObservationsIn="columns", then you might experience a significant reduction in computation time. You cannot specify ObservationsIn="columns" for predictor data in a table.

    Example: ObservationsIn="columns"

    Data Types: char | string

    Since R2025a

    Predicted response value to use for observations with missing predictor values, specified as "quantile", "omitted", a numeric scalar, or a numeric vector.

    ValueDescription
    "quantile"loss uses the specified quantile of the observed response values in the training data as the predicted response value for observations with missing predictor values.
    "omitted"loss excludes observations with missing predictor values from the loss computation.
    Numeric scalar or vector
    • If PredictionForMissingValue is a scalar, then loss uses this value as the predicted response value for observations with missing predictor values. The function uses the same value for all quantiles.

    • If PredictionForMissingValue is a vector, its length must be equal to the number of quantiles specified by the Quantiles name-value argument. loss uses element i in the vector as the quantile i predicted response value for observations with missing predictor values.

    If an observation is missing an observed response value or an observation weight, then loss does not use the observation in the loss computation.

    Example: PredictionForMissingValue="omitted"

    Data Types: single | double | char | string

    Observation weights, specified as a nonnegative numeric vector or the name of a variable in Tbl. The software weights each observation in X or Tbl with the corresponding value in Weights. The length of Weights must equal the number of observations in X or Tbl.

    If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as "W".

    By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. If you supply weights, then loss computes the weighted loss and normalizes the weights to sum to 1.

    Data Types: single | double | char | string

    Output Arguments

    collapse all

    Loss, returned as a numeric vector. The type of loss depends on LossFun. Each element in L corresponds to a quantile in Quantiles.

    Algorithms

    collapse all

    References

    [1] Koenker, Roger, and José A. F. Machado. “Goodness of Fit and Related Inference Processes for Quantile Regression.” Journal of the American Statistical Association 94, no. 448 (December 1999): 1296–1310. https://doi.org/10.1080/01621459.1999.10473882.

    Version History

    Introduced in R2024b

    expand all