Interpretability and Explainability for Credit Scoring
This example shows different techniques for interpreting and explaining the logic behind credit scoring predictions.
While credit scorecard models, in general, are straightforward to interpret, this example uses a black-box model, without revealing the logic, to show the workflow for explaining predictions. In this example, you work with the creditscorecard
object from Financial Toolbox™ and pass the scoring function to interpretability tools in Statistics and Machine Learning Toolbox™. These tools include:
These tools support regression and classification modeling, which make interpretation more efficient. For more information on these techniques, see Interpret Machine Learning Models. In this example, the score model of the creditscorecard
object is used as the black-box model. For an example of this workflow, see Interpret and Stress-Test Deep Learning Networks for Probability of Default.
Background
Credit scoring is the process by which lenders assign scores to borrowers and use those scores to decide whether or not to accept a loan application. Lenders use credit scoring models to come up with these scores. Traditionally, simple, interpretable models such as credit scorecards and logistic regression have been widely used in this area. Over time, Machine Learning (ML) and Artificial Intelligence (AI) techniques were introduced to implement credit scoring models. Such techniques, while improving predictive power, also are more black-box and there is little or no explanation behind the decisions. Consequently, the credit scoring predictions of ML and AI techniques are difficult for humans to interpret. As a result, lenders are implementing different interpretability and explainability methods to get a better understanding of the logic behind the credit scoring predictions. In addition, regulators are also requiring that practitioners use more interpretability and also fairness methods to ensure that no equal opportunity laws are broken while making credit decisions. For more information on using fairness metrics, see Explore Fairness Metrics for Credit Scoring Model.
Create Credit Scorecard Model
Load credit card data and create a credit scorecard model using the creditscorecard
object.
load CreditCardData sc = creditscorecard(data,IDVar="CustID");
Apply automatic binning. This example uses a split algorithm, with a maximum of 5 bins per predictor and with the constraint that each bin has at least 50 observations. For more information, see autobinning
and Bin Data to Create Credit Scorecards Using Binning Explorer.
sc = autobinning(sc,Algorithm="Split",AlgorithmOptions={"MaxNumBins",5,"MinCount",50});
Verify that the binning for the numeric variables has five bins or levels. For example, here is the bin information for the customer age predictor.
bi = bininfo(sc,"CustAge");
disp(bi)
Bin Good Bad Odds WOE InfoValue _____________ ____ ___ ______ _________ __________ {'[-Inf,35)'} 93 76 1.2237 -0.50255 0.038003 {'[35,47)' } 321 184 1.7446 -0.14791 0.0094258 {'[47,53)' } 194 64 3.0312 0.40456 0.03252 {'[53,61)' } 128 64 2 -0.011271 2.0365e-05 {'[61,Inf]' } 67 9 7.4444 1.303 0.079183 {'Totals' } 803 397 2.0227 NaN 0.15915
For categorical variables, there may be fewer than five bins in total, since the split algorithm in autobinning
can merge categories into a single group. For example, the residential status predictor initially has three levels: Tenant, Home Owner, and Other. The split algorithm returns only two groups.
[bi,cg] = bininfo(sc,"ResStatus");
disp(bi)
Bin Good Bad Odds WOE InfoValue __________ ____ ___ ______ _________ _________ {'Group1'} 672 344 1.9535 -0.034802 0.0010314 {'Group2'} 131 53 2.4717 0.20049 0.0059418 {'Totals'} 803 397 2.0227 NaN 0.0069732
The category grouping information shows that Tenant and Home Owner are merged into Group1
. This grouping means Tenant and Home Owner will get the same number of points in the final scorecard.
disp(cg)
Category BinNumber ______________ _________ {'Tenant' } 1 {'Home Owner'} 1 {'Other' } 2
Fit the model coefficients using fitmodel
. For illustration purposes, keep only five model predictors, including some categorical ones.
PredictorsInModel = ["CustAge" "CustIncome" "EmpStatus" "ResStatus" "UtilRate"]; sc = fitmodel(sc,PredictorVars=PredictorsInModel,VariableSelection="fullmodel",Display="off");
Scale the points so that 500
points correspond to odds of 2
, and the odds double every 50
points.
sc = formatpoints(sc,PointsOddsAndPDO=[500 2 50]);
Display the scorecard using displaypoints
. The credit scorecard model is a lookup table. For example, for customer age there are five bins or levels with different points for each level. A visualization of the score, as a function of age, has a piecewise constant pattern with five levels, as shown in Partial Dependence Plot. For residential status, Tenant and Home Owner are in Group1
, and they get the same number of points.
[ScorecardPointsTable,MinPts,MaxPts] = displaypoints(sc); disp(ScorecardPointsTable)
Predictors Bin Points ______________ _________________ ______ {'CustAge' } {'[-Inf,35)' } 71.84 {'CustAge' } {'[35,47)' } 91.814 {'CustAge' } {'[47,53)' } 122.93 {'CustAge' } {'[53,61)' } 99.511 {'CustAge' } {'[61,Inf]' } 173.54 {'CustAge' } {'<missing>' } NaN {'ResStatus' } {'Group1' } 97.318 {'ResStatus' } {'Group2' } 116.43 {'ResStatus' } {'<missing>' } NaN {'EmpStatus' } {'Unknown' } 85.326 {'EmpStatus' } {'Employed' } 118.11 {'EmpStatus' } {'<missing>' } NaN {'CustIncome'} {'[-Inf,31000)' } 68.158 {'CustIncome'} {'[31000,38000)'} 102.11 {'CustIncome'} {'[38000,42000)'} 93.302 {'CustIncome'} {'[42000,47000)'} 109.18 {'CustIncome'} {'[47000,Inf]' } 121.21 {'CustIncome'} {'<missing>' } NaN {'UtilRate' } {'[-Inf,0.12)' } 106.84 {'UtilRate' } {'[0.12,0.3)' } 94.647 {'UtilRate' } {'[0.3,0.39)' } 140.95 {'UtilRate' } {'[0.39,0.68)' } 69.635 {'UtilRate' } {'[0.68,Inf]' } 94.634 {'UtilRate' } {'<missing>' } NaN
One "traditional" approach to measure the importance of each predictor in the credit scorecard model is to compute the percent of the total score range that comes from each predictor.
PtsRange = MaxPts - MinPts; NumPred = length(PredictorsInModel); PercentWeight = zeros(NumPred,1); for ii = 1 : NumPred Ind = strcmpi(PredictorsInModel{ii},ScorecardPointsTable.Predictors); MaxPtsPred = max(ScorecardPointsTable.Points(Ind)); MinPtsPred = min(ScorecardPointsTable.Points(Ind)); PercentWeight(ii) = 100*(MaxPtsPred-MinPtsPred)/PtsRange; end PredictorWeights = table(PredictorsInModel',PercentWeight,VariableNames=["Predictor" "Weight"]); disp(PredictorWeights)
Predictor Weight ____________ ______ "CustAge" 36.587 "CustIncome" 19.085 "EmpStatus" 11.795 "ResStatus" 6.8768 "UtilRate" 25.656
Customer age is the main variable in the model, since it corresponds to 36% of the total score range. A customer can get anywhere from 71.8
to 173.5
points, based on their age. This range has a difference of over 100 points between the minimum and maximum values. On the other end, residential status plays a minor role in the score, with points ranging from 97.3
to 116.4
only, a difference of less than 20 points.
An alternative to this "traditional" approach is to use the following explainability techniques from Statistics and Machine Learning Toolbox: Partial Dependence Plot, Individual Conditional Expectation Plot, Local Interpretable Model-Agnostic Explanation Plot, and Shapley Values.
Partial Dependence Plot
The partial dependence plot (PDP) shows the effect of one or two variables on the predicted score.
Use the plotPartialDependence
function to pass the score
method of the creditscorecard
object as a black-box model.
One Predictor
Select a predictor using the dropdown option.
As an example, if customer age is selected, note the piecewise constant shape of the plot, with jumps occurring at the bin edges, and with five levels in total. This is consistent with the five bins for customer age in the credit scorecard model.
predictor = PredictorsInModel(1);
plotPartialDependence(@(tbl)score(sc,tbl),predictor,data)
Two Predictors
Generating a partial dependence plot with two predictors can take significantly longer than the one-predictor case. Typically, the more unique values a predictor has in the data set, the longer it takes to plot the partial dependence. Here's a report of the number of unique values in the data.
NumUniqueValuesTable = varfun(@(x)length(unique(x)),data(:,PredictorsInModel));
NumUniqueValuesTable.Properties.VariableNames = erase(NumUniqueValuesTable.Properties.VariableNames,'Fun_');
disp(NumUniqueValuesTable)
CustAge CustIncome EmpStatus ResStatus UtilRate _______ __________ _________ _________ ________ 54 45 2 3 110
The categorical predictors have fewer unique levels, so these plots for categorical predictors run faster. Numeric variables like customer age are relatively discrete and so is utilization rate because this rate's values are rounded to two decimals. However, a continuous predictor (for example, the average monthly balance (AMBalance
) in the data
table) can have many unique values.
Select a predictor and additional predictor to then use plotPartialDependence
to generate the PDP plot.
predictor = PredictorsInModel(1); additionalPredictor = PredictorsInModel(4); plotPartialDependence(@(tbl)score(sc,tbl),[predictor,additionalPredictor],data)
Individual Conditional Expectation Plot
Similar to the partial dependence plot, the individual conditional expectation plot (ICE) shows the effect of one of the variables on the predicted score. The red line in the ICE plot matches the partial dependence plot. While the partial dependence plot shows the average score as a function of the selected predictor, the ICE plot disaggregates and shows the score for each observation (each gray line) as a function of the selected predictor. For more information, see the More About section on the plotPartialDependence
reference page.
predictor = PredictorsInModel(1); plotPartialDependence(@(tbl)score(sc,tbl),predictor, ... data,Conditional="absolute")
Select a Query Point
The PDP and ICE plots provide a global view of the credit scorecard scores, where the score is visualized for all values of the selected predictor. In contrast, LIME and Shapley are local explainability techniques that explain the behavior of the model in a neighborhood of a query point of choice. For more information, see Interpret Machine Learning Models.
To see how a query point helps to explain credit scores , use index 92 in the training data as your query point. You can select other query points by typing an index value into the text box.
QueryPointIndex = 92; % ID number of the observation to explain
Use score
to display the query point score and the points, by predictor, for this query point.
[ScoresTraining,PointsTraining] = score(sc,data);
fprintf("Selected index %d, with score %g\n",QueryPointIndex,ScoresTraining(QueryPointIndex))
Selected index 92, with score 417.289
disp(PointsTraining(QueryPointIndex,:))
CustAge ResStatus EmpStatus CustIncome UtilRate _______ _________ _________ __________ ________ 71.84 97.318 85.326 68.158 94.647
The plots that follow show the location of the query point (dotted vertical line) relative to the distribution of values for the scores and for each predictor. For example, for index 92
, the score is low relative to the distribution. For the customer age predictor, the query point is on the bottom group. This result is similar for the customer income, employment status, and residential status predictors. The points for the utilization rate predictor are closer to the middle of the distribution, but still below average.
figure t = tiledlayout(3,2); nexttile plotQueryInHistogram("Score",QueryPointIndex,ScoresTraining,PointsTraining) nexttile plotQueryInHistogram("CustAge",QueryPointIndex,ScoresTraining,PointsTraining) nexttile plotQueryInHistogram("CustIncome",QueryPointIndex,ScoresTraining,PointsTraining) nexttile plotQueryInHistogram("EmpStatus",QueryPointIndex,ScoresTraining,PointsTraining) nexttile plotQueryInHistogram("ResStatus",QueryPointIndex,ScoresTraining,PointsTraining) nexttile plotQueryInHistogram("UtilRate",QueryPointIndex,ScoresTraining,PointsTraining) title(t,"Query Point Relative to Distribution")
Local Interpretable Model-Agnostic Explanation Plot
The local interpretable model-agnostic explanation (LIME) plot shows the coefficients of a local linear model near the instance of a score that you want to explain. LIME explains the scores around a particular observation, or query point, with a simple local model, such as a linear regression model or a decision tree.
Use lime
to create a lime
object specifying the data set of interest (the training data set), the model "type" (use "regression" to indicate a numeric prediction), and which variables are categorical. When you create a lime
object, the toolbox generates a random synthetic data set. Use the synthetic data to fit simple local models to explain the local behavior.
rng('default'); % for reproducibility limeExplainer = lime(@(tbl)score(sc,tbl),data(:,PredictorsInModel),Type="regression", ... CategoricalPredictors=["ResStatus" "EmpStatus"]);
Select a maximum number of predictors (NumPredToExplain
) to explain and use a SimpleModelType
of a "tree"
to explain the local behavior of the score. The results are sensitive to the kernel width parameter (KernelWidthChoice
) that controls how much neighbor points are weighted while fitting the linear simple model.
NumPredToExplain = 5; % number of variables/predictors to explain KernelWidthChoice = 0.5; limeExplainer = fit(limeExplainer,data(QueryPointIndex,PredictorsInModel), ... NumPredToExplain,SimpleModelType="tree",KernelWidth=KernelWidthChoice); figure f = plot(limeExplainer);
When the simple model is a tree, based on the reported predictor importance, customer age is the main predictor, followed by employment status and customer income.
Shapley Values
Shapley values explain the deviation of the predicted score from the average predicted score. The sum of the Shapley values for all predictors corresponds to the total deviation of the score for the query point from the average score.
The Shapley values are estimated based on a simulation. For larger data sets, this simulation is time consuming. For illustration purposes, this example uses only 500 rows of the training data with the shapley
constructor.
rng('default'); % for reproducibility shapleyExplainer = shapley(@(tbl)score(sc,tbl),data(500,PredictorsInModel), ... QueryPoint=data(QueryPointIndex,PredictorsInModel),CategoricalPredictors=["EmpStatus" "ResStatus"]); figure plot(shapleyExplainer)
For the query point with index 92
, the predicted score is 417
, whereas the average score for the training data set passed to shapley
function is 516
. You expect the Shapley values to be negative, or at least have important negative components that explain why the predicted score is below average. In contrast, for scores above average, the Shapley values add up to a positive amount. In this example, the estimated Shapley values show that the main deviation from the average is explained by the customer income and employment status predictors, followed by the customer age and utilization rate predictors. The residential status predictor is not important. This result might be a combination of the simulation itself with the fact that residential status has a smaller impact on scores for this model.
Final Remarks
Explainability techniques are widely used to understand the behavior of predictive models. In this example, a creditscorecard
model shows how explainability techniques, such as PDP, ICE, LIME, and Shapley are applied to explain a black-box model. Although credit scorecard models are simple and interpretable, you can apply the explainability tools in this example to other scoring models that are treated as black-box models or to supported models in Statistics and Machine Learning Toolbox. Alternatively, instead of explaining the scores, you can pass the probdefault
function as the black-box model to explain the probability of default predictions.
Local Functions
function plotQueryInHistogram(VariableChoice,QueryPointIndex,Scores,PointsTable) if VariableChoice=="Score" HistData = Scores; else HistData = PointsTable.(VariableChoice); end histogram(HistData) hold on xline(HistData(QueryPointIndex),':','LineWidth',2.5) hold off xlabel(VariableChoice) ylabel('Frequency') end
Related Topics
- Measure Transition Risk for Loan Portfolios with Respect to Climate Scenarios
- Assess Physical and Transition Risk for Mortgages