Feature selection for classification using neighborhood component analysis (NCA)
FeatureSelectionNCAClassification object contains the data, fitting
information, feature weights, and other parameters of a neighborhood component analysis
fscnca learns the feature weights using a
diagonal adaptation of NCA and returns an instance of a
FeatureSelectionNCAClassification object. The function achieves
feature selection by regularizing the feature weights.
FeatureSelectionNCAClassification object using
FitMethod — Name of fitting method
Name of the fitting method used to fit this model, stored as one of the following:
'exact'— Perform fitting using all of the data.
'none'— No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call to
'average'— Divide the data into partitions (subsets), fit each partition using the
exactmethod, and return the average of the feature weights. You can specify the number of partitions using the
NumPartitionsname-value pair argument.
InitialLearningRate — Initial learning rate
positive real scalar
Initial learning rate for the
'minibatch-lbfgs' solvers, stored as a positive real
learning rate decays over iterations starting at the value specified for
TuningSubsetSize name-value pair arguments to
control the automatic tuning of initial learning rate in the call to
FeatureWeights — Feature weights
numeric vector | numeric matrix
Feature weights, specified as a p-by-1 numeric vector
or a p-by-m numeric matrix, where
p is the number of predictor variables after dummy
variables are created for categorical variables (for more details, see
FeatureWeights is a
m is the number of partitions specified via the
'NumPartitions' name-value pair argument in the call
The absolute value of
FeatureWeights(k) is a measure of
the importance of predictor
FeatureWeights(k) value that is close to 0 indicates
k does not influence the response in
CategoricalPredictors — Categorical predictor indices
vector of positive integers |
Categorical predictor indices, specified as a vector of positive integers.
CategoricalPredictors contains index values indicating that the
corresponding predictors are categorical. The index values are between 1 and
p, where p is the number of predictors used to
train the model. If none of the predictors are categorical, then this property is empty
ResponseName — Response variable name
Response variable name, specified as a character vector.
PredictorNames — Predictor variable names
cell array of unique character vectors
Predictor variable names in order of their appearance in the predictor data, specified as a
cell array of character vectors. The length of
equal to the number of variables in the training data
X used as
ExpandedPredictorNames — Expanded predictor names
cell array of unique character vectors
Expanded predictor names, specified as a cell array of unique character vectors.
If the model uses encoding for categorical variables, then
ExpandedPredictorNames includes the names that describe the
expanded variables. Otherwise,
ExpandedPredictorNames is the same as
|loss||Evaluate accuracy of learned feature weights on test data|
|predict||Predict responses using neighborhood component analysis (NCA) classifier|
|refit||Refit neighborhood component analysis (NCA) model for classification|
Load the sample data.
The data set has 34 continuous predictors. The response variable is the radar returns, labeled as b (bad) or g (good).
Fit a neighborhood component analysis (NCA) model for classification to detect the relevant features.
mdl = fscnca(X,Y);
The returned NCA model,
mdl, is a
FeatureSelectionNCAClassification object. This object stores information about the training data, model, and optimization. You can access the object properties, such as the feature weights, using dot notation.
Plot the feature weights.
figure() plot(mdl.FeatureWeights,'ro') xlabel('Feature Index') ylabel('Feature Weight') grid on
The weights of the irrelevant features are zero. The
'Verbose',1 option in the call to
fscnca displays the optimization information on the command line. You can also visualize the optimization process by plotting the objective function versus the iteration number.
figure plot(mdl.FitInfo.Iteration,mdl.FitInfo.Objective,'ro-') grid on xlabel('Iteration Number') ylabel('Objective')
ModelParameters property is a
struct that contains more information about the model. You can access the fields of this property using dot notation. For example, see if the data was standardized or not.
ans = logical 0
0 means that the data was not standardized before fitting the NCA model. You can standardize the predictors when they are on very different scales using the
'Standardize',1 name-value pair argument in the call to
Value. To learn how value classes affect copy operations, see Copying Objects.
Introduced in R2016b