modelAccuracy
Compute RMSE of predicted and observed PDs on grouped data
Syntax
Description
computes the root mean squared error (RMSE) of the observed compared to the
predicted probabilities of default (PD). AccMeasure
= modelAccuracy(pdModel
,data
,GroupBy
)GroupBy
is required
and can be any column in the data
input (not necessarily a
model variable). The modelAccuracy
function computes the observed
PD as the default rate of each group and the predicted PD as the average PD for each
group. modelAccuracy
supports comparison against a reference
model.
[
specifies options using one or more name-value pair arguments in addition to the
input arguments in the previous syntax.AccMeasure
,AccData
] = modelAccuracy(___,Name,Value
)
Examples
Compute Model Accuracy for Logistic Lifetime PD Model
This example shows how to use fitLifetimePDModel
to fit data with a Logistic
model and then use modelAccuracy
to compute the root mean squared error (RMSE) of the observed probabilities of default (PDs) with respect to the predicted PDs.
Load Data
Load the credit portfolio data.
load RetailCreditPanelData.mat
disp(head(data))
ID ScoreGroup YOB Default Year __ __________ ___ _______ ____ 1 Low Risk 1 0 1997 1 Low Risk 2 0 1998 1 Low Risk 3 0 1999 1 Low Risk 4 0 2000 1 Low Risk 5 0 2001 1 Low Risk 6 0 2002 1 Low Risk 7 0 2003 1 Low Risk 8 0 2004
disp(head(dataMacro))
Year GDP Market ____ _____ ______ 1997 2.72 7.61 1998 3.57 26.24 1999 2.86 18.1 2000 2.43 3.19 2001 1.26 -10.51 2002 -0.59 -22.95 2003 0.63 2.78 2004 1.85 9.48
Join the two data components into a single data set.
data = join(data,dataMacro); disp(head(data))
ID ScoreGroup YOB Default Year GDP Market __ __________ ___ _______ ____ _____ ______ 1 Low Risk 1 0 1997 2.72 7.61 1 Low Risk 2 0 1998 3.57 26.24 1 Low Risk 3 0 1999 2.86 18.1 1 Low Risk 4 0 2000 2.43 3.19 1 Low Risk 5 0 2001 1.26 -10.51 1 Low Risk 6 0 2002 -0.59 -22.95 1 Low Risk 7 0 2003 0.63 2.78 1 Low Risk 8 0 2004 1.85 9.48
Partition Data
Separate the data into training and test partitions.
nIDs = max(data.ID); uniqueIDs = unique(data.ID); rng('default'); % For reproducibility c = cvpartition(nIDs,'HoldOut',0.4); TrainIDInd = training(c); TestIDInd = test(c); TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd)); TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd));
Create Logistic
Lifetime PD Model
Use fitLifetimePDModel
to create a Logistic
model using the training data.
pdModel = fitLifetimePDModel(data(TrainDataInd,:),"Logistic",... 'AgeVar','YOB',... 'IDVar','ID',... 'LoanVars','ScoreGroup',... 'MacroVars',{'GDP','Market'},... 'ResponseVar','Default'); disp(pdModel)
Logistic with properties: ModelID: "Logistic" Description: "" Model: [1x1 classreg.regr.CompactGeneralizedLinearModel] IDVar: "ID" AgeVar: "YOB" LoanVars: "ScoreGroup" MacroVars: ["GDP" "Market"] ResponseVar: "Default"
Display the underlying model.
disp(pdModel.Model)
Compact generalized linear regression model: logit(Default) ~ 1 + ScoreGroup + YOB + GDP + Market Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue __________ _________ _______ ___________ (Intercept) -2.7422 0.10136 -27.054 3.408e-161 ScoreGroup_Medium Risk -0.68968 0.037286 -18.497 2.1894e-76 ScoreGroup_Low Risk -1.2587 0.045451 -27.693 8.4736e-169 YOB -0.30894 0.013587 -22.738 1.8738e-114 GDP -0.11111 0.039673 -2.8006 0.0051008 Market -0.0083659 0.0028358 -2.9502 0.0031761 388097 observations, 388091 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 1.85e+03, p-value = 0
Compute Model Accuracy
Model accuracy measures how accurate the predicted probabilities of default are. For example, if the model predicts a 10% PD for a group, does the group end up showing an approximate 10% default rate, or is the eventual rate much higher or lower? While model discrimination measures the risk ranking only, model accuracy measures the accuracy of the predicted risk levels.
modelAccuracy
computes the root mean squared error (RMSE) of the observed PDs with respect to the predicted PDs. A grouping variable is required and it can be any column in the data input (not necessarily a model variable). The modelAccuracy
function computes the observed PD as the default rate of each group and the predicted PD as the average PD for each group.
DataSetChoice ="Training"; if DataSetChoice=="Training" Ind = TrainDataInd; else Ind = TestDataInd; end GroupingVar =
"YOB"; [AccMeasure,AccData] = modelAccuracy(pdModel,data(Ind,:),GroupingVar,'DataID',DataSetChoice)
AccMeasure=table
RMSE
_________
Logistic, grouped by YOB, Training 0.0004142
AccData=16×4 table
ModelID YOB PD GroupCount
__________ ___ _________ __________
"Observed" 1 0.017421 58092
"Observed" 2 0.012305 56723
"Observed" 3 0.011382 55524
"Observed" 4 0.010741 54650
"Observed" 5 0.00809 53770
"Observed" 6 0.0066747 53186
"Observed" 7 0.0032198 36959
"Observed" 8 0.0018757 19193
"Logistic" 1 0.017185 58092
"Logistic" 2 0.012791 56723
"Logistic" 3 0.01131 55524
"Logistic" 4 0.010615 54650
"Logistic" 5 0.0083982 53770
"Logistic" 6 0.0058744 53186
"Logistic" 7 0.0035872 36959
"Logistic" 8 0.0023689 19193
%disp(AccMeasure)
Visualize the model accuracy using modelAccuracyPlot
.
modelAccuracyPlot(pdModel,data(Ind,:),GroupingVar,'DataID',DataSetChoice);
You can use more than one variable for grouping. For this example, group by the variables YOB
and ScoreGroup
.
AccMeasure = modelAccuracy(pdModel,data(Ind,:),["YOB","ScoreGroup"],'DataID',DataSetChoice); disp(AccMeasure)
RMSE __________ Logistic, grouped by YOB, ScoreGroup, Training 0.00066239
Now visualize the two grouping variables using modelAccuracyPlot
.
modelAccuracyPlot(pdModel,data(Ind,:),["YOB","ScoreGroup"],'DataID',DataSetChoice);
Input Arguments
pdModel
— Probability of default model
Logistic
object | Probit
object | Cox
object | customLifetimePDModel
object
Probability of default model, specified as a previously created Logistic
, Probit
, or Cox
object using
fitLifetimePDModel
. Alternatively, you can create a custom
probability of default model using customLifetimePDModel
.
Data Types: object
data
— Data
table
Data, specified as a
NumRows
-by-NumCols
table with
projected predictor values to make lifetime predictions. The predictor names
and data types must be consistent with the underlying model.
Data Types: table
GroupBy
— Name of column in data
input used to group the data
string | character vector
Name of column in the data
input used to group the
data, specified as a string or character vector. GroupBy
does not have to be a model variable name. For each group designated by
GroupBy
, the modelAccuracy
function computes the observed default rates and average predicted PDs are
computed to measure the RMSE.
Data Types: string
| char
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: [AccMeasure,AccData] =
modelAccuracy(pdModel,data(Ind,:),'GroupBy',["YOB","ScoreGroup"],'DataID',"DataSetChoice")
DataID
— Data set identifier
""
(default) | character vector | string
Data set identifier, specified as the comma-separated pair consisting
of 'DataID'
and a character vector or string.
DataID
is included in the
modelAccuracy
output for reporting
purposes.
Data Types: char
| string
ReferencePD
— Conditional PD values predicted for data
by reference model
[]
(default) | numeric vector
ReferenceID
— Identifier for reference model
'Reference'
(default) | character vector | string
Identifier for the reference model, specified as the comma-separated
pair consisting of 'ReferenceID'
and a character
vector or string. ReferenceID
is used in the
modelAccuracy
output for reporting
purposes.
Data Types: char
| string
Output Arguments
AccMeasure
— RMSE values
table
Accuracy measure, returned as a table.
RMSE values, returned as a single-column 'RMSE'
table.
The table has one row if only the pdModel
accuracy is
measured and it has two rows if reference model information is given. The
row names of AccMeasure
report the model IDs, grouping
variables, and data ID.
Note
The reported RMSE values depend on the grouping variable for the
required GroupBy
argument.
AccData
— Observed and predicted PD values for each group
table
Accuracy data, returned as a table.
Observed and predicted PD values for each group, returned as a table. The
reported observed PD values correspond to the observed default rate for each
group. The reported predicted PD values are the average PD values predicted
by the pdModel
object for each group, and similarly for
the reference model. The modelAccuracy
function stacks
the PD data, placing the observed values for all groups first, then the
predicted PDs for the pdModel
, and then the predicted
PDs for the reference model, if given.
The column 'ModelID'
identifies which rows correspond
to the observed PD, pdModel
, or reference model. The
table also has one column for each grouping variable showing the unique
combinations of grouping values. The 'PD'
column of
AccData
is a the PD data. The last column of
AccData
is a 'GroupCount'
column
with the group counts data.
More About
Model Accuracy
Model accuracy measures the accuracy of the predicted probability of default (PD) values.
To measure model accuracy, also called model calibration, you must compare the predicted PD values to the observed default rates. For example, if a group of customers is predicted to have an average PD of 5%, then is the observed default rate for that group close to 5%?
The modelAccuracy
function requires a grouping variable to
compute average predicted PD values within each group and the average observed
default rate also within each group. modelAccuracy
uses the root
mean squared error (RMSE) to measure the deviations between the observed and
predicted values across groups. For example, the grouping variable could be the
calendar year, so that rows corresponding to the same calendar year are grouped
together. Then, for each year the software computes the observed default rate and
the average predicted PD. The modelAccuracy
function then applies
the RMSE formula to obtain a single measure of the prediction error across all years
in the sample.
Suppose there are N observations in the data set, and there are M groups G1,...,GM. The default rate for group Gi is
where:
Di is the number of defaults observed in group Gi.
Ni is the number of observations in group Gi.
The average predicted probability of default PDi for group Gi is
where PD(j) is the probability of default for observation j. In other words, this is the average of the predicted PDs within group Gi.
Therefore, the RMSE is computed as
The RMSE, as defined, depends on the selected grouping variable. For example, grouping by calendar year and grouping by years-on-books might result in different RSME values.
Use modelAccuracyPlot
to
visualize observed default rates and predicted PD values on grouped data.
References
[1] Baesens, Bart, Daniel Roesch, and Harald Scheule. Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. Wiley, 2016.
[2] Bellini, Tiziano. IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide with Examples Worked in R and SAS. San Diego, CA: Elsevier, 2019.
[3] Breeden, Joseph. Living with CECL: The Modeling Dictionary. Santa Fe, NM: Prescient Models LLC, 2018.
[4] Roesch, Daniel and Harald Scheule. Deep Credit Risk: Machine Learning with Python. Independently published, 2020.
Version History
Introduced in R2020bR2022b: Support for customLifetimePDModel
model
The pdModel
input supports an option for a
customLifetimePDModel
model object that you can create using
customLifetimePDModel
.
R2022a: Additional column for AccData
for GroupCount
There is an additional column for AccData
for
GroupCount
for PD models.
R2022a: GroupCount
column automatically included in AccData
outputs
Starting in R2022a, the AccData
output of
modelAccuracy
contains an additional column for
GroupCount
with the group counts data.
If you extract the end column from the AccData
output using
AccData{:,end}
, the end
column is
different than previous releases of modelAccuracy
.
See Also
modelDiscrimination
| modelDiscriminationPlot
| modelAccuracyPlot
| predictLifetime
| predict
| fitLifetimePDModel
| Logistic
| Probit
| Cox
| customLifetimePDModel
Topics
- Basic Lifetime PD Model Validation
- Compare Logistic Model for Lifetime PD to Champion Model
- Compare Lifetime PD Models Using Cross-Validation
- Expected Credit Loss Computation
- Compare Model Discrimination and Accuracy to Validate of Probability of Default
- Compare Probability of Default Using Through-the-Cycle and Point-in-Time Models
- Overview of Lifetime Probability of Default Models
Apri esempio
Si dispone di una versione modificata di questo esempio. Desideri aprire questo esempio con le tue modifiche?
Comando MATLAB
Hai fatto clic su un collegamento che corrisponde a questo comando MATLAB:
Esegui il comando inserendolo nella finestra di comando MATLAB. I browser web non supportano i comandi MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)