Subsets of uncorrelated features

Question

Kais il 10 Lug 2021

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/875908-subsets-of-uncorrelated-features

Commentato: Kais il 15 Lug 2021

Given a N by N correlation matrix of N features, how to find ALL subsets of pariwise uncorrelated features if we assume two features are uncorrelated if their correlation score is less than a threshold Alpha. There is no restriction on the number of features making the subsets. All features making a subset need to be pairwise uncorrelated.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Jeff Miller il 12 Lug 2021

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/875908-subsets-of-uncorrelated-features#answer_744638

Modificato: Jeff Miller il 12 Lug 2021

Apri in MATLAB Online

N = 5;
R = rand(N);  % We will ignore the lower triangular part of this array
rCutoff = 0.4;
% Make a cell array that holds all possible combinations of 2, 3, 4, ... features
combos = cell(0,0);
for i=2:N
    iCombos = nchoosek(1:N,i);
    for j=1:size(iCombos,1)
        combos{end+1} = iCombos(j,:);
    end
end
ncells = numel(combos);
% Check each cell to make sure that all of the pairwise correlations are
% less than the cutoff
qualifies = true(1,ncells);
for icell=1:ncells
    features = combos{icell};
    nfeatures = numel(features);
    for ifeature=1:nfeatures-1
        for jfeature=ifeature+1:nfeatures
            iifeature = features(ifeature);
            jjfeature = features(jfeature);
            if abs(R(iifeature,jjfeature)) > rCutoff
                qualifies(icell) = false;
            end
        end
    end
end

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Jeff Miller il 13 Lug 2021

You may well be right, that but "if sum" line is cognitively impenetrable to me. :)

Thanks for accepting my answer.

Kais il 14 Lug 2021

Apri in MATLAB Online

I am looking for a pairwise uncorrleation between ALL features iofn the "feature" variable". The line:

sum(nonzeros(triu(abs(R(features,features)),1)) > rCutoff)

will result into a matrix of logical values showing which feature pairs are correlated and which are not (triu is only there to reduce the symmetric correlation matrix). If any of the values of the matrix is true (equivalently, sum of values of the matrix is different from zero), The subset unqualifies as an uncorrelated subset.

Accedi per commentare.

Answer 2

Ive J il 11 Lug 2021

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/875908-subsets-of-uncorrelated-features#answer_744398

Modificato: Ive J il 12 Lug 2021

Apri in MATLAB Online

Let R be the pairwise correlation matrix:

N = 10;
R = rand(N); 
R(logical(eye(N))) = 1; 
for i = 1:size(R, 1) - 1
    for j = i+1:size(R, 1)
        R(j, i) = R(i, j);
    end
end
disp(R)
cutoff = 0.4; % independent features
idx = R < cutoff;
idx = triu(idx); % R(i, j) == R(j, i) in pairwise correlation matrix
features = "feature" + (1:N); % feature names
% there may be a simpler way to do this
indepFeatures = [];
for i = 1:N
    indepFeatures = [indepFeatures, arrayfun(@(x)[x, features(i)], features(idx(i, :)), 'uni', false)];
end
indepFeatures = vertcat(indepFeatures{:});
% find all cliques of this set
nodes = zeros(size(indepFeatures, 1), 1);
[~, nodes(:, 1)] = ismember(indepFeatures(:, 1), features);
[~, nodes(:, 2)] = ismember(indepFeatures(:, 2), features);
G = graph(nodes(:, 1), nodes(:, 2));
M = maximalCliques(adjacency(G)); 
indepSets = cell(size(M, 2), 1);
for i = 1:numel(indepSets)
    indepSets{i} = features(M(:, i) ~= 0);
end
indepSets(cellfun(@numel, indepSets) < 2) = []; % this can be further unified with indepFeatures

You can find maximalCliques in FEX.

12 Commenti
Mostra 10 commenti meno recentiNascondi 10 commenti meno recenti

Kais il 12 Lug 2021

Apri in MATLAB Online

@Ive J I tried your solution on a 10 by 10 validation correlation matrix. The subsets are correct but incomplete. There should be 22 subsets (17 made of pairs and 5 made of triples) of uncorrelated features in the matrix below. Your code leads to 11 subsets (6 paird and 5 triples).

cutoff = 0.4;
R=[
    1.0000    0.8735   -0.4498    0.7882    0.8260   -0.3310    0.7521    0.8991   -0.3789    0.7062
    0.8735    1.0000   -0.4168    0.9182    0.6789   -0.3210    0.7074    0.8244   -0.5138    0.7961
   -0.4498   -0.4168    1.0000   -0.3241   -0.4364    0.7018   -0.3825   -0.3257    0.4335   -0.3814
    0.7882    0.9182   -0.3241    1.0000    0.4857   -0.2513    0.6033    0.8936   -0.5378    0.5715
    0.8260    0.6789   -0.4364    0.4857    1.0000   -0.2878    0.6495    0.5794   -0.2881    0.8435
   -0.3310   -0.3210    0.7018   -0.2513   -0.2878    1.0000   -0.3040   -0.2360    0.2695   -0.2606
    0.7521    0.7074   -0.3825    0.6033    0.6495   -0.3040    1.0000    0.6469   -0.3280    0.6357
    0.8991    0.8244   -0.3257    0.8936    0.5794   -0.2360    0.6469    1.0000   -0.3919    0.5148
   -0.3789   -0.5138    0.4335   -0.5378   -0.2881    0.2695   -0.3280   -0.3919    1.0000   -0.3725
    0.7062    0.7961   -0.3814    0.5715    0.8435   -0.2606    0.6357    0.5148   -0.3725    1.0000]

Kais il 14 Lug 2021

Modificato: Kais il 15 Lug 2021

@Ive J I tried your code with cutoff = .7. I get 32 pairs, 2 triples, 2 quadruples, and 5 quintuples. While the number of pairs and quintuples seem to be correct, the numbers of triples and quadruples are not. For example, the triple [6, 7, 9] is missing.

There should be a total of 112 uncorrelated features. You code finds 44 only. Any clue why is that?

Kais il 15 Lug 2021

Never mind the last question. I figured the algorithm finds max cliques so it won't count subgraphs within larger subgraphs ([6, 7,9] won't show up because it's part of the larger subset [4, 6,7,9]), which significantly reduces the number of subsets.

Accedi per commentare.