Subsets of uncorrelated features
Mostra commenti meno recenti
Given a N by N correlation matrix of N features, how to find ALL subsets of pariwise uncorrelated features if we assume two features are uncorrelated if their correlation score is less than a threshold Alpha. There is no restriction on the number of features making the subsets. All features making a subset need to be pairwise uncorrelated.
Risposta accettata
Più risposte (2)
Let R be the pairwise correlation matrix:
N = 10;
R = rand(N);
R(logical(eye(N))) = 1;
for i = 1:size(R, 1) - 1
for j = i+1:size(R, 1)
R(j, i) = R(i, j);
end
end
disp(R)
cutoff = 0.4; % independent features
idx = R < cutoff;
idx = triu(idx); % R(i, j) == R(j, i) in pairwise correlation matrix
features = "feature" + (1:N); % feature names
% there may be a simpler way to do this
indepFeatures = [];
for i = 1:N
indepFeatures = [indepFeatures, arrayfun(@(x)[x, features(i)], features(idx(i, :)), 'uni', false)];
end
indepFeatures = vertcat(indepFeatures{:});
% find all cliques of this set
nodes = zeros(size(indepFeatures, 1), 1);
[~, nodes(:, 1)] = ismember(indepFeatures(:, 1), features);
[~, nodes(:, 2)] = ismember(indepFeatures(:, 2), features);
G = graph(nodes(:, 1), nodes(:, 2));
M = maximalCliques(adjacency(G));
indepSets = cell(size(M, 2), 1);
for i = 1:numel(indepSets)
indepSets{i} = features(M(:, i) ~= 0);
end
indepSets(cellfun(@numel, indepSets) < 2) = []; % this can be further unified with indepFeatures
12 Commenti
Kais
il 11 Lug 2021
Image Analyst
il 11 Lug 2021
@Kais, Why do you need this? What is the use case? What will you do with the information after this? Have you considered principal components analysis?
Kais
il 11 Lug 2021
Ive J
il 11 Lug 2021
@Kais So, have you looked at feature selection? There are quite copule of approaches to do so, especially for clinical/medical applications (I assumed you'll use them in a regression analysis afterwards). For instance, you can use simple F-test approaches like fsrftest, or penalized regression (e.g. lasso or ridge).
Kais
il 12 Lug 2021
Ive J
il 12 Lug 2021
@Kais So, you can try my modified answer; this should do the job (but note the problem may get complicated with large number of features). But, I'm sure you are aware of the drawbacks of this approach: the simplest scenario would be regression analysis when you don't know which [correlated] features better explain the response variable, and you may incorrectly exclude those features. Say A and B are highly correlated but A is a better predictor of response, but you select B (simply because you don't check the amount of response variance A or B explain).
Kais
il 12 Lug 2021
Kais
il 12 Lug 2021
As I commented on the last line, indepSets and indepFeatures (lenght = 22 with your data) should be merged, and in your example there are 7 (and not 5) triples. So, if you only keep sets with a length > 2, you can then merge this with indepFeatures,which has been already generated:
indepSets(cellfun(@numel, indepSets) < 3) = []; % my original example is < 2
So, as I said there are 7 triples:
"feature1" "feature6" "feature9"
"feature2" "feature6" "feature9"
"feature4" "feature6" "feature9"
"feature5" "feature6" "feature9"
"feature6" "feature7" "feature9"
"feature6" "feature8" "feature9"
"feature6" "feature9" "feature10"
Kais
il 15 Lug 2021
Image Analyst
il 11 Lug 2021
0 voti
Would stepwise regression be of any help?
Otherwise, just make an N by N table of correlation coefficients by corelating every feature with every other feature.
Categorie
Scopri di più su Descriptive Statistics in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!