Matching Feature ranking algoritum outputs in classification leaner

Question

Christopher McCausland il 5 Nov 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2043287-matching-feature-ranking-algoritum-outputs-in-classification-leaner

Commentato: Christopher McCausland il 8 Nov 2023

Hi,

In 2023b within the feature selection algoritum tab of the classification learner app, you can generate feature ranks with five diffrent algoritums: MRMR, Chi2, ReliefF, ANOVA and Kruskal Wallis.

MRMR and Chi2 can be replicated with:

[idx,scores] = fscmrmr(randSamp(:,3:end),'Stage');
[idx,scores] = fscchi2(randSamp(:,3:end),'Stage');

Where randSamp is a table with some variables ignored at the start and 'Stage' is the lable of intrest.

However, I cannot figure out how to replicate the same with ANOVA and KW, I have tried something like this:

[idx,scores] = anova1(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));
[idx,scores] = kruskalwallis(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));

And while it done compute *something* I have no idea what it is doing or how to get it to match what the classification learner app is doing. Can anyone shed some light on this?

Christopher

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Drew il 6 Nov 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2043287-matching-feature-ranking-algoritum-outputs-in-classification-leaner#answer_1347107

Apri in MATLAB Online

The short answer is that, for some feature ranking techniques, there is some normalization of the features before the ranking. This is by design, since some feature ranking techniques are particularly sensitive to normalization. To see how Classification Learner is ranking the features, use the "Generate Function" button in Classification Learner to generate code to replicate the feature selection.

For example, take these steps to see some example generated code:

(1) t=readtable("fisheriris.csv");

(2) Start Classification Learner, load the fisher iris data, take defaults at session start

(3) Rank features with Kruskal-Wallis, choose keeping the top three features

(4) Train the default tree model

(5) In the Export area of the toolstrip, choose "Generate Function".

Below is a section of code from the function generated by Classification Learner. Notice the calls to "standardizeMissing" and "normalize" in the first two lines of (non-comment) code. These functions are also used in the later cross-validation part of the code. So, for each training fold (or for all of the training data for the final model), the "standardizeMissing" function and the default "zscore" method of the "normalize" function are being used before ranking the features. Note: The normalization used before feature ranking is independent of any normalization (or no normalization) used before model training.

% Feature Ranking and Selection
% Replace Inf/-Inf values with NaN to prepare data for normalization
predictors = standardizeMissing(predictors, {Inf, -Inf});
% Normalize data for feature ranking
predictorMatrix = normalize(predictors, "DataVariable", ~isCategoricalPredictor);
newPredictorMatrix = zeros(size(predictorMatrix));
for i = 1:size(predictorMatrix, 2)
    if isCategoricalPredictor(i)
        newPredictorMatrix(:,i) = grp2idx(predictorMatrix{:,i});
    else
        newPredictorMatrix(:,i) = predictorMatrix{:,i};
    end
end
predictorMatrix = newPredictorMatrix;
responseVector = grp2idx(response);
% Rank features using Kruskal Wallis algorithm
for i = 1:size(predictorMatrix, 2)
    pValues(i) = kruskalwallis(...
        predictorMatrix(:,i), ...
        responseVector, ...
        'off');
end
[~,featureIndex] = sort(-log(pValues), 'descend');
numFeaturesToKeep = 3;
includedPredictorNames = predictors.Properties.VariableNames(featureIndex(1:numFeaturesToKeep));
predictors = predictors(:,includedPredictorNames);
isCategoricalPredictor = isCategoricalPredictor(featureIndex(1:numFeaturesToKeep));

If this answer helps you, please remember to accept the answer.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Christopher McCausland il 8 Nov 2023

Hi Drew,

I thought I had accepted this answer and all, so appologies. It was a good idea to generate the code and then review, thank you for adding in the additional discription too. it made it a lot easier to follow the design thought path.

Christopher

Accedi per commentare.

Matching Feature ranking algoritum outputs in classification leaner

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Matching Feature ranking algoritum outputs in classification leaner

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti