screenpredictors
Screen credit scorecard predictors for predictive value
Description
returns the output variable, metric_table
= screenpredictors(data)metric_table, a MATLAB® table containing the calculated values for several measures of
predictive power for each predictor variable in the data.
Use the screenpredictors function as a preprocessing step
in the Credit Scorecard Modeling Workflow to
reduce the number of predictor variables before you create the credit scorecard
using the creditscorecard function from
Financial Toolbox™. In addition, you can use Threshold
Predictors from Risk Management Toolbox™to interactively set credit scorecard predictor thresholds using the
output from screenpredictors
before you create the credit scorecard using the creditscorecard.
specifies options using one or more name-value pair arguments in addition to the
input arguments in the previous syntax. metric_table
= screenpredictors(___,Name,Value)
Examples
Reduce the number of predictor variables by screening predictors before you create a credit scorecard.
Use the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).
load CreditCardData.matDefine 'IDVar' and 'ResponseVar'.
idvar = 'CustID'; responsevar = 'status';
Use screenpredictors to calculate the predictor screening metrics. The function returns a table containing the metrics values. Each table row corresponds to a predictor from the input table data.
metric_table = screenpredictors(data,'IDVar', idvar,'ResponseVar', responsevar)
metric_table=9×7 table
InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing
_________ _____________ _______ _______ _______ __________ ______________
CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0
TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0
CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0
TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0
UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0
AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0
EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0
OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0
ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0
metric_table = sortrows(metric_table,'AccuracyRatio','descend')
metric_table=9×7 table
InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing
_________ _____________ _______ _______ _______ __________ ______________
CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0
CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0
TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0
EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0
AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0
ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0
OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0
UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0
TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0
Based on the AccuracyRatio metric, select the top predictors to use when you create the creditscorecard object.
varlist = metric_table.Row(metric_table.AccuracyRatio > 0.09)
varlist = 4×1 cell
{'CustIncome'}
{'CustAge' }
{'TmWBank' }
{'EmpStatus' }
Use creditscorecard to create a createscorecard object based on only the "screened" predictors.
sc = creditscorecard(data,'IDVar', idvar,'ResponseVar', responsevar, 'PredictorVars', varlist)
sc =
creditscorecard with properties:
GoodLabel: 0
ResponseVar: 'status'
WeightsVar: ''
VarNames: {'CustID' 'CustAge' 'TmAtAddress' 'ResStatus' 'EmpStatus' 'CustIncome' 'TmWBank' 'OtherCC' 'AMBalance' 'UtilRate' 'status'}
NumericPredictors: {'CustAge' 'CustIncome' 'TmWBank'}
CategoricalPredictors: {'EmpStatus'}
BinMissingData: 0
IDVar: 'CustID'
PredictorVars: {'CustAge' 'EmpStatus' 'CustIncome' 'TmWBank'}
Data: [1200×11 table]
Input Arguments
Data for the creditscorecard object, specified as a
MATLAB table, tall table, or tall timetable, where each column of
data can be any one of the following data types:
Numeric
Logical
Cell array of character vectors
Character array
Categorical
String
Data Types: table
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
Example: metric_table =
screenpredictors(data,'IDVar','CustAge','ResponseVar','status','PredictorVars',{'CustID','CustIncome'})
Name of identifier variable, specified as the comma-separated pair
consisting of 'IDVar' and a case-sensitive
character vector. The 'IDVar' data can be ordinal
numbers or Social Security numbers. By specifying
'IDVar', you can omit the identifier variable
from the predictor variables easily.
Data Types: char
Response variable name, specified as the comma-separated pair
consisting of 'ResponseVar' and a case-sensitive
character vector. The response variable data must be binary, the
"Good" or "Bad" indicator.
If not specified, 'ResponseVar' is set to the last
column of the input data by default.
Data Types: char
Names of predictor variables, specified as the comma-separated
pair consisting of 'PredictorVars' and a
case-sensitive cell array of character vectors or string array. By
default, when you create a creditscorecard
object, all variables are predictors except for
IDVar and ResponseVar.
Any name you specify using 'PredictorVars' must
differ from the IDVar and
ResponseVar names.
Data Types: cell | string
Name of weights variable, specified as the comma-separated pair
consisting of 'WeightsVar' and a case-sensitive
character vector to indicate which column name in the
data table contains the row weights.
If you do not specify 'WeightsVar' when you
create a creditscorecard object, then the
function uses the unit weights as the observation weights.
Data Types: char
Number of (equal frequency) bins for numeric predictors, specified
as the comma-separated pair consisting of
'NumBins' and a scalar numeric.
Data Types: double
Small shift in frequency tables that contain zero entries,
specified as the comma-separated pair consisting of
'FrequencyShift' and a scalar numeric with a
value between 0 and 1.
If the frequency table of a predictor contains any "pure" bins
(containing all goods or all bads) after you bin the data using
autobinning, then
the function adds the 'FrequencyShift' value to
all bins in the table. To avoid any perturbation, set
'FrequencyShift' to
0.
Data Types: double
Output Arguments
Calculated values for the predictor screening metrics, returned as table. Each table row corresponds to a predictor from the input table data. The table columns contain calculated values for the following metrics:
'InfoValue'— Information value. This metric measures the strength of a predictor in the fitting model by determining the deviation between the distributions of"Goods"and"Bads".'AccuracyRatio'— Accuracy ratio.'AUROC'— Area under the ROC curve.'Entropy'— Entropy. This metric measures the level of unpredictability in the bins. You can use the entropy metric to validate a risk model.'Gini'— Gini. This metric measures the statistical dispersion or inequality within a sample of data.'Chi2PValue'— Chi-square p-value. This metric is computed from the chi-square metric and is a measure of the statistical difference and independence between groups.'PercentMissing'— Percentage of missing values in the predictor. This metric is expressed in decimal form.
Extended Capabilities
This function supports input data that is specified as a
tall column vector, a tall table, or a tall timetable. Note that the output for
numeric predictors might be slightly different when using a tall array.
Categorical predictors return the same results for tables and tall arrays. For
more information, see tall and Tall Arrays.
Version History
Introduced in R2019a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleziona un sito web
Seleziona un sito web per visualizzare contenuto tradotto dove disponibile e vedere eventi e offerte locali. In base alla tua area geografica, ti consigliamo di selezionare: .
Puoi anche selezionare un sito web dal seguente elenco:
Come ottenere le migliori prestazioni del sito
Per ottenere le migliori prestazioni del sito, seleziona il sito cinese (in cinese o in inglese). I siti MathWorks per gli altri paesi non sono ottimizzati per essere visitati dalla tua area geografica.
Americhe
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)