Main Content

Compare Results for Regression and Tobit EAD Models

This example shows how to use fitEADModel to create a Regression model and a Tobit model for exposure at default (EAD) and then compare the results.

Load EAD Data

Load the EAD data.

load EADData.mat
head(EADData)
    UtilizationRate    Age     Marriage        Limit         Drawn          EAD    
    _______________    ___    ___________    __________    __________    __________

        0.24359        25     not married         44776         10907         44740
        0.96946        44     not married    2.1405e+05    2.0751e+05         40678
              0        40     married        1.6581e+05             0    1.6567e+05
        0.53242        38     not married    1.7375e+05         92506        1593.5
         0.2583        30     not married         26258        6782.5        54.175
        0.17039        54     married        1.7357e+05         29575        576.69
        0.18586        27     not married         19590          3641        998.49
        0.85372        42     not married    2.0712e+05    1.7682e+05    1.6454e+05
rng('default');
NumObs = height(EADData);
c = cvpartition(NumObs,'HoldOut',0.4);
TrainingInd = training(c);
TestInd = test(c);

Select Model Type

Select a Regression and a Tobit model type.

ModelTypeR = "Regression";
ModelTypeT = "Tobit";

Select Conversion Measure

Select the conversion measure for the EAD response values.

ConversionMeasure = "LCF";

Create Regression EAD Model

Use fitEADModel to create a Regression model using the EADData.

eadModelRegression = fitEADModel(EADData,ModelTypeR,'PredictorVars',{'UtilizationRate','Age','Marriage'}, ...
    'ConversionMeasure',ConversionMeasure,'DrawnVar','Drawn','LimitVar','Limit','ResponseVar','EAD');
disp(eadModelRegression);
  Regression with properties:

    ConversionTransform: "logit"
      BoundaryTolerance: 1.0000e-07
                ModelID: "Regression"
            Description: ""
        UnderlyingModel: [1x1 classreg.regr.CompactLinearModel]
          PredictorVars: ["UtilizationRate"    "Age"    "Marriage"]
            ResponseVar: "EAD"
               LimitVar: "Limit"
               DrawnVar: "Drawn"
      ConversionMeasure: "lcf"

Display the underlying model. The underlying Regression model's response variable is the logit transformation of the EAD response data. Use the 'BoundaryTolerance', 'LimitVar', and 'DrawnVar' name-value arguments to modify the transformation.

disp(eadModelRegression.UnderlyingModel);
Compact linear regression model:
    EAD_lcf_logit ~ 1 + UtilizationRate + Age + Marriage

Estimated Coefficients:
                            Estimate        SE         tStat       pValue  
                            _________    _________    _______    __________

    (Intercept)               -2.4745      0.29892    -8.2781    1.6448e-16
    UtilizationRate            6.0045      0.19901     30.172    7.703e-182
    Age                     -0.020095    0.0073019     -2.752     0.0059471
    Marriage_not married     -0.03509      0.13935    -0.2518        0.8012


Number of observations: 4378, Error degrees of freedom: 4374
Root Mean Squared Error: 4.48
R-squared: 0.173,  Adjusted R-Squared: 0.173
F-statistic vs. constant model: 305, p-value = 5.7e-180

Create Tobit EAD Model

Use fitEADModel to create a Tobit model using the EADData.

eadModelTobit = fitEADModel(EADData,ModelTypeT,'PredictorVars',{'UtilizationRate','Age','Marriage'}, ...
    'ConversionMeasure',ConversionMeasure,'DrawnVar','Drawn','LimitVar','Limit','ResponseVar','EAD','CensoringSide',"right",'LeftLimit',0.4,'RightLimit',0.5);
disp(eadModelTobit);
  Tobit with properties:

        CensoringSide: "right"
            LeftLimit: 0.4000
           RightLimit: 0.5000
              ModelID: "Tobit"
          Description: ""
      UnderlyingModel: [1x1 risk.internal.credit.TobitModel]
        PredictorVars: ["UtilizationRate"    "Age"    "Marriage"]
          ResponseVar: "EAD"
             LimitVar: "Limit"
             DrawnVar: "Drawn"
    ConversionMeasure: "lcf"

Display the underlying model. The underlying Tobit model's response variable is the complog transformation of the EAD response data. Use the 'LimitVar', 'DrawnVar', 'CensoringSide', 'RightLimit', 'LeftLimit', and 'SolverOptions' name-value arguments to modify the transformation.

disp(eadModelTobit.UnderlyingModel);
Tobit regression model, right-censored:
     EAD_lcf = min(Y*,0.5)
     Y* ~ 1 + UtilizationRate + Age + Marriage

Estimated coefficients:
                             Estimate        SE         tStat       pValue  
                            __________    _________    ________    _________

    (Intercept)                0.18088     0.021541      8.3972            0
    UtilizationRate            0.42381     0.014164      29.921            0
    Age                     -0.0014564    0.0005244     -2.7772    0.0055057
    Marriage_not married    -0.0040192     0.012014    -0.33454      0.73799
    (Sigma)                    0.27917    0.0043096      64.779            0

Number of observations: 4378
Number of left-censored observations: 0
Number of uncensored observations: 2802
Number of right-censored observations: 1576
Log-likelihood: -1756.98

Predict EAD for Regression Model

EAD prediction operates on the underlying compact statistical model and then transforms the predicted values back to the EAD scale. You can specify the predict function with different options for the 'ModelLevel' name-vale argument.

predictedEADRegression = predict(eadModelRegression,EADData(TestInd,:),'ModelLevel','ead');
predictedConversionRegression = predict(eadModelRegression,EADData(TestInd,:),'ModelLevel','ConversionMeasure');

Predict EAD for Tobit Model

EAD prediction operates on the underlying compact statistical model and then transforms the predicted values back to the EAD scale. You can specify the predict function with different options for the 'ModelLevel' name-vale argument.

predictedEADTobit = predict(eadModelTobit,EADData(TestInd,:),'ModelLevel','ead');
predictedConversionTobit = predict(eadModelTobit,EADData(TestInd,:),'ModelLevel','ConversionMeasure');

Validate EAD Regression Model

For model validation of the Regression model, use modelDiscrimination, modelDiscriminationPlot, modelCalibration, and modelCalibrationPlot.

Use modelDiscrimination and then modelDiscriminationPlot to plot the ROC curve.

ModelLevel = "ConversionMeasure";
[DiscMeasureRegression, DiscDataRegression] = modelDiscrimination(eadModelRegression,EADData(TestInd,:),'ShowDetails',true,'ModelLevel',ModelLevel)
DiscMeasureRegression=1×3 table
                   AUROC      Segment      SegmentCount
                  _______    __________    ____________

    Regression    0.70898    "all_data"        1751    

DiscDataRegression=1534×3 table
        X             Y           T   
    __________    _________    _______

             0            0    0.95722
             0    0.0027778    0.95722
             0    0.0041667     0.9566
             0    0.0055556    0.95639
             0    0.0083333    0.95576
    0.00096993    0.0097222    0.95555
    0.00096993     0.016667     0.9549
     0.0019399     0.016667    0.95474
     0.0019399     0.018056    0.95468
     0.0038797     0.018056    0.95403
     0.0048497     0.019444    0.95381
     0.0058196     0.019444    0.95314
     0.0067895     0.020833    0.95291
     0.0067895     0.022222    0.95233
     0.0087294     0.026389    0.95224
     0.0087294     0.031944      0.952
      ⋮

modelDiscriminationPlot(eadModelRegression,EADData(TestInd, :),'ModelLevel',ModelLevel,'SegmentBy','Marriage');

Figure contains an axes object. The axes object with title EAD_lcf ROC Segmented by Marriage, xlabel False Positive Rate, ylabel True Positive Rate contains 2 objects of type line. These objects represent Regression, married, AUROC = 0.70813, Regression, not married, AUROC = 0.70921.

Use modelCalibration and then modelCalibrationPlot to show a scatter plot of the predictions.

YData = "Observed";

[CalMeasureRegression,CalDataRegression] = modelCalibration(eadModelRegression,EADData(TestInd,:),'ModelLevel',ModelLevel)
CalMeasureRegression=1×4 table
                  RSquared     RMSE      Correlation    SampleMeanError
                  ________    _______    ___________    _______________

    Regression    0.16148     0.41023      0.40184         -0.025994   

CalDataRegression=1751×3 table
     Observed     Predicted_Regression    Residuals_Regression
    __________    ____________________    ____________________

       0.99919           0.17519                   0.824      
     0.0020632           0.17343                -0.17137      
       0.03741            0.7527                -0.71529      
       0.75518           0.89867                -0.14349      
    0.00076139          0.042389               -0.041628      
        0.9998           0.95153                0.048274      
     0.0056134            0.1338                -0.12819      
      0.048451          0.043424               0.0050276      
       0.01448          0.059339               -0.044858      
       0.95329           0.67009                  0.2832      
       0.97847             0.939                 0.03947      
       0.71895           0.80122               -0.082271      
       0.79096            0.3791                 0.41186      
      0.042816           0.52542                 -0.4826      
       0.97169            0.2119                 0.75979      
       0.99182           0.62543                 0.36639      
      ⋮

modelCalibrationPlot(eadModelRegression, EADData(TestInd,:), 'ModelLevel', ModelLevel, 'YData', YData);

Figure contains an axes object. The axes object with title Scatter Regression, R-Squared: 0.16148, xlabel EAD_lcf Predicted, ylabel EAD_lcf Observed contains 2 objects of type scatter, line. These objects represent Data, Fit.

Validate EAD Tobit Model

For model validation of the Tobit model, use modelDiscrimination, modelDiscriminationPlot, modelCalibration, and modelCalibrationPlot.

Use modelDiscrimination and then modelDiscriminationPlot to plot the ROC curve.

ModelLevel = "ConversionMeasure";
[DiscMeasureTobit,DiscDataTobit] = modelDiscrimination(eadModelTobit,EADData(TestInd,:),'ShowDetails',true,'ModelLevel',ModelLevel)
DiscMeasureTobit=1×3 table
              AUROC      Segment      SegmentCount
             _______    __________    ____________

    Tobit    0.70909    "all_data"        1751    

DiscDataTobit=1534×3 table
        X             Y           T   
    __________    _________    _______

             0            0    0.42178
             0    0.0027778    0.42178
             0    0.0041667     0.4212
             0    0.0055556    0.42076
    0.00096993    0.0069444    0.42062
    0.00096993    0.0097222    0.42018
    0.00096993     0.011111    0.42004
    0.00096993     0.018056     0.4196
     0.0019399     0.018056     0.4195
     0.0029098     0.019444    0.41945
     0.0048497     0.019444    0.41901
     0.0058196     0.020833    0.41887
     0.0058196     0.022222    0.41854
     0.0067895     0.022222    0.41842
     0.0067895     0.023611    0.41827
     0.0067895     0.029167    0.41827
      ⋮

modelDiscriminationPlot(eadModelTobit,EADData(TestInd, :),'ModelLevel',ModelLevel,'SegmentBy','Marriage');

Figure contains an axes object. The axes object with title EAD_lcf ROC Segmented by Marriage, xlabel False Positive Rate, ylabel True Positive Rate contains 2 objects of type line. These objects represent Tobit, married, AUROC = 0.70814, Tobit, not married, AUROC = 0.70928.

UsemodelCalibration and then modelCalibrationPlot. to show a scatter plot of the predictions.

YData = "Observed";

[CalMeasureTobit,CalDataTobit] = modelCalibration(eadModelTobit,EADData(TestInd,:),'ModelLevel',ModelLevel)
CalMeasureTobit=1×4 table
             RSquared     RMSE      Correlation    SampleMeanError
             ________    _______    ___________    _______________

    Tobit    0.15929     0.39572      0.39911          0.13366    

CalDataTobit=1751×3 table
     Observed     Predicted_Tobit    Residuals_Tobit
    __________    _______________    _______________

       0.99919        0.21657             0.78261   
     0.0020632        0.21571            -0.21365   
       0.03741        0.35115            -0.31374   
       0.75518        0.39272             0.36245   
    0.00076139        0.12184            -0.12107   
        0.9998        0.41744             0.58237   
     0.0056134        0.19913            -0.19351   
      0.048451        0.12215           -0.073701   
       0.01448        0.14323            -0.12875   
       0.95329        0.33415             0.61914   
       0.97847        0.41069             0.56778   
       0.71895         0.3627             0.35624   
       0.79096        0.27467             0.51629   
      0.042816        0.30579            -0.26297   
       0.97169        0.23025             0.74144   
       0.99182        0.32461             0.66721   
      ⋮

modelCalibrationPlot(eadModelTobit,EADData(TestInd,:),'ModelLevel',ModelLevel,'YData',YData);

Figure contains an axes object. The axes object with title Scatter Tobit, R-Squared: 0.15929, xlabel EAD_lcf Predicted, ylabel EAD_lcf Observed contains 2 objects of type scatter, line. These objects represent Data, Fit.

Plot Histograms of Observed with Respect to Predicted EAD

Plot a histogram of observed with respect to the predicted EAD for the Regression model.

figure;
histogram(CalDataRegression.Observed);
hold on;
histogram(CalDataRegression.(('Predicted_' + ModelTypeR)));
legend('Observed','Predicted');

Figure contains an axes object. The axes object contains 2 objects of type histogram. These objects represent Observed, Predicted.

Plot a histogram of observed with respect to the predicted EAD for the Tobit model.

figure;
histogram(CalDataTobit.Observed);
hold on;
histogram(CalDataTobit.(('Predicted_' + ModelTypeT)));
legend('Observed','Predicted');

Figure contains an axes object. The axes object contains 2 objects of type histogram. These objects represent Observed, Predicted.

For both the Tobit and Regression models, the Age and UtilizationRate predictors are statistically significant, while the Marriage predictor is not statistically significant. Also, the Tobit and Regression models have different R-square values.

See Also

| | | | | | |

Related Topics