modelAccuracy

Compute RMSE of predicted and observed PDs on grouped data

Since R2020b

modelAccuracy is renamed to modelCalibration. modelAccuracy is not recommended. Use modelCalibration instead.

Syntax

AccMeasure = modelAccuracy(pdModel,data,GroupBy)

[AccMeasure,AccData] = modelAccuracy(___,Name,Value)

Description

AccMeasure = modelAccuracy(pdModel,data,GroupBy) computes the root mean squared error (RMSE) of the observed compared to the predicted probabilities of default (PD). GroupBy is required and can be any column in the data input (not necessarily a model variable). The modelAccuracy function computes the observed PD as the default rate of each group and the predicted PD as the average PD for each group. modelAccuracy supports comparison against a reference model.

[AccMeasure,AccData] = modelAccuracy(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in the previous syntax.

Input Arguments

collapse all

`pdModel` — Probability of default model
`Logistic` object | `Probit` object | `Cox` object | `customLifetimePDModel` object

Probability of default model, specified as a previously created Logistic, Probit, or Cox object using fitLifetimePDModel. Alternatively, you can create a custom probability of default model using customLifetimePDModel.

Data Types: object

`data` — Data
table

Data, specified as a NumRows-by-NumCols table with projected predictor values to make lifetime predictions. The predictor names and data types must be consistent with the underlying model.

Data Types: table

`GroupBy` — Name of column in `data` input used to group the data
string | character vector

Name of column in the data input used to group the data, specified as a string or character vector. GroupBy does not have to be a model variable name. For each group designated by GroupBy, the modelAccuracy function computes the observed default rates and average predicted PDs are computed to measure the RMSE.

Data Types: string | char

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: [AccMeasure,AccData] = modelAccuracy(pdModel,data(Ind,:),'GroupBy',["YOB","ScoreGroup"],'DataID',"DataSetChoice")

`DataID` — Data set identifier
`""` (default) | character vector | string

Data set identifier, specified as the comma-separated pair consisting of 'DataID' and a character vector or string. DataID is included in the modelAccuracy output for reporting purposes.

Data Types: char | string

`ReferencePD` — Conditional PD values predicted for `data` by reference model
`[]` (default) | numeric vector

Conditional PD values predicted for data by the reference model, specified as the comma-separated pair consisting of 'ReferencePD' and a NumRows-by-1 numeric vector. The functions reports the modelAccuracy output information for both the pdModel object and the reference model.

Data Types: double

`ReferenceID` — Identifier for reference model
`'Reference'` (default) | character vector | string

Identifier for the reference model, specified as the comma-separated pair consisting of 'ReferenceID' and a character vector or string. ReferenceID is used in the modelAccuracy output for reporting purposes.

Data Types: char | string

Output Arguments

collapse all

`AccMeasure` — RMSE values
table

Accuracy measure, returned as a table.

RMSE values, returned as a single-column 'RMSE' table. The table has one row if only the pdModel accuracy is measured and it has two rows if reference model information is given. The row names of AccMeasure report the model IDs, grouping variables, and data ID.

Note

The reported RMSE values depend on the grouping variable for the required GroupBy argument.

`AccData` — Observed and predicted PD values for each group
table

Accuracy data, returned as a table.

Observed and predicted PD values for each group, returned as a table. The reported observed PD values correspond to the observed default rate for each group. The reported predicted PD values are the average PD values predicted by the pdModel object for each group, and similarly for the reference model. The modelAccuracy function stacks the PD data, placing the observed values for all groups first, then the predicted PDs for the pdModel, and then the predicted PDs for the reference model, if given.

The column 'ModelID' identifies which rows correspond to the observed PD, pdModel, or reference model. The table also has one column for each grouping variable showing the unique combinations of grouping values. The 'PD' column of AccData is a the PD data. The last column of AccData is a 'GroupCount' column with the group counts data.

More About

collapse all

Model Accuracy

Model accuracy measures the accuracy of the predicted probability of default (PD) values.

To measure model accuracy, also called model calibration, you must compare the predicted PD values to the observed default rates. For example, if a group of customers is predicted to have an average PD of 5%, then is the observed default rate for that group close to 5%?

The modelAccuracy function requires a grouping variable to compute average predicted PD values within each group and the average observed default rate also within each group. modelAccuracy uses the root mean squared error (RMSE) to measure the deviations between the observed and predicted values across groups. For example, the grouping variable could be the calendar year, so that rows corresponding to the same calendar year are grouped together. Then, for each year the software computes the observed default rate and the average predicted PD. The modelAccuracy function then applies the RMSE formula to obtain a single measure of the prediction error across all years in the sample.

Suppose there are N observations in the data set, and there are M groups G₁,...,G_M. The default rate for group G_i is

$D R_{i} = \frac{D_{i}}{N_{i}}$

where:

D_i is the number of defaults observed in group G_i.

N_i is the number of observations in group G_i.

The average predicted probability of default PD_i for group G_i is

$P D_{i} = \frac{1}{N_{i}} \sum_{j \in G_{i}} P D (j)$

where PD(j) is the probability of default for observation j. In other words, this is the average of the predicted PDs within group G_i.

Therefore, the RMSE is computed as

$R M S E = \sqrt{\sum_{i = 1}^{M} (\frac{N_{i}}{N}) {(D R_{i} - P D_{i})}^{2}}$

The RMSE, as defined, depends on the selected grouping variable. For example, grouping by calendar year and grouping by years-on-books might result in different RSME values.

Use modelAccuracyPlot to visualize observed default rates and predicted PD values on grouped data.

References

[1] Baesens, Bart, Daniel Roesch, and Harald Scheule. Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. Wiley, 2016.

[2] Bellini, Tiziano. IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide with Examples Worked in R and SAS. San Diego, CA: Elsevier, 2019.

[3] Breeden, Joseph. Living with CECL: The Modeling Dictionary. Santa Fe, NM: Prescient Models LLC, 2018.

[4] Roesch, Daniel and Harald Scheule. Deep Credit Risk: Machine Learning with Python. Independently published, 2020.

Version History

Introduced in R2020b

expand all

R2023a: `modelAccuracy` function is renamed to `modelCalibration` function

The modelAccuracy function is renamed to modelCalibration function. The use of modelAccuracy is not recommended, use modelCalibration instead.

R2022b: Support for `customLifetimePDModel` model

The pdModel input supports an option for a customLifetimePDModel model object that you can create using customLifetimePDModel.

R2022a: Additional column for `AccData` for `GroupCount`

There is an additional column for AccData for GroupCount for PD models.

R2022a: `GroupCount` column automatically included in `AccData` outputs

Starting in R2022a, the AccData output of modelAccuracy contains an additional column for GroupCount with the group counts data.

If you extract the end column from the AccData output using AccData{:,end}, the end column is different than previous releases of modelAccuracy.

modelAccuracy

Syntax

Description

Input Arguments

`pdModel` — Probability of default model
`Logistic` object | `Probit` object | `Cox` object | `customLifetimePDModel` object

`data` — Data
table

`GroupBy` — Name of column in `data` input used to group the data
string | character vector

Name-Value Arguments

`DataID` — Data set identifier
`""` (default) | character vector | string

`ReferencePD` — Conditional PD values predicted for `data` by reference model
`[]` (default) | numeric vector

`ReferenceID` — Identifier for reference model
`'Reference'` (default) | character vector | string

Output Arguments

`AccMeasure` — RMSE values
table

`AccData` — Observed and predicted PD values for each group
table

More About

Model Accuracy

References

Version History

R2023a: `modelAccuracy` function is renamed to `modelCalibration` function

R2022b: Support for `customLifetimePDModel` model

R2022a: Additional column for `AccData` for `GroupCount`

R2022a: `GroupCount` column automatically included in `AccData` outputs

See Also

Topics

modelAccuracy

Syntax

Description

Input Arguments

pdModel — Probability of default model Logistic object | Probit object | Cox object | customLifetimePDModel object

data — Data table

GroupBy — Name of column in data input used to group the data string | character vector

Name-Value Arguments

DataID — Data set identifier "" (default) | character vector | string

ReferencePD — Conditional PD values predicted for data by reference model [] (default) | numeric vector

ReferenceID — Identifier for reference model 'Reference' (default) | character vector | string

Output Arguments

AccMeasure — RMSE values table

AccData — Observed and predicted PD values for each group table

More About

Model Accuracy

References

Version History

R2023a: modelAccuracy function is renamed to modelCalibration function

R2022b: Support for customLifetimePDModel model

R2022a: Additional column for AccData for GroupCount

R2022a: GroupCount column automatically included in AccData outputs

See Also

Topics

`pdModel` — Probability of default model
`Logistic` object | `Probit` object | `Cox` object | `customLifetimePDModel` object

`data` — Data
table

`GroupBy` — Name of column in `data` input used to group the data
string | character vector

`DataID` — Data set identifier
`""` (default) | character vector | string

`ReferencePD` — Conditional PD values predicted for `data` by reference model
`[]` (default) | numeric vector

`ReferenceID` — Identifier for reference model
`'Reference'` (default) | character vector | string

`AccMeasure` — RMSE values
table

`AccData` — Observed and predicted PD values for each group
table

R2023a: `modelAccuracy` function is renamed to `modelCalibration` function

R2022b: Support for `customLifetimePDModel` model

R2022a: Additional column for `AccData` for `GroupCount`

R2022a: `GroupCount` column automatically included in `AccData` outputs