How to reduce the number of unique values in a matrix?
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
I would like to reduce the number of unique values in my matrix to a fixed number. If I just round my values, I still get a too high number of unique values. For instance, I would like to be able to group the matrix values into maybe 10 groups (=10 unique values). I would like the values of each group to relate to the original values, for instance as the mean of all the values in the group. My original idea was to do something like k-means clustering, but I don't think this can be done with data in a matrix.
Is there a way to do this?
0 Commenti
Risposta accettata
Stephen23
il 27 Apr 2017
Modificato: Stephen23
il 27 Apr 2017
Although your data is arranged in a matrix, the matrix is a red-herring because actually you want a simple 1D clustering of the values themselves, irrelevant of their position in the matrix. This is simple, as K-Means clustering can be done on any number of dimensions, including on 1D data. So convert your matrix to a vector, apply kmeans, and the use the indices to allocate the values into the clusters. The simply reshape to get back the matrix shape.
Here is a complete working example, with just two clusters for clarity:
>> inp = [1,9,8,8;9,8,8,1;1,8,1,9;7,8,2,1]
inp =
1 9 8 8
9 8 8 1
1 8 1 9
7 8 2 1
>> [idx,vec] = kmeans(inp(:),2);
>> out = reshape(vec(idx),size(inp))
out =
1.1667 8.2000 8.2000 8.2000
8.2000 8.2000 8.2000 1.1667
1.1667 8.2000 1.1667 8.2000
8.2000 8.2000 1.1667 1.1667
Più risposte (1)
Adam
il 27 Apr 2017
Modificato: Adam
il 27 Apr 2017
vals = ceil( 10 * vals / max( vals(:) ) );
3 Commenti
Adam
il 27 Apr 2017
Well, once you have your 10 unique labels you can use them as indices into the original values and replace the labels with the average of those values e.g.
newVals = ceil( 10 * vals / max( vals(:) ) );
for n = 1:10
newVals( newVals == n ) = mean( vals( newVals == n ) );
end
Stephen23
il 27 Apr 2017
Modificato: Stephen23
il 27 Apr 2017
I also considered rounding as per Adam's answer, but this has the disadvantage that then the cluster values are linearly spaced, and this might not best represent the actual cluster values. Consider clusters centered around 0, 3, and 10: rounding would split the 3 cluster into 0 and 5... this might not be the desired effect.
Vedere anche
Categorie
Scopri di più su Multidimensional Arrays in Help Center e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!