BUG (#2)? kmeans is sensitive to rows (points) order

Question

micholeodon il 12 Mar 2019

1
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/449618-bug-2-kmeans-is-sensitive-to-rows-points-order

Modificato: micholeodon il 12 Mar 2019

Apri in MATLAB Online

Dear All,

I have noticed that kmeans gives different results for different points order !

This does not make any sense in my opinion.

I guess row order in matrix should have no impact on centroids location if random generator is set to fixed seed.

Anybody can explain that?

clear; close all; clc;
nPoints     = 100; 
nDimensions = 2;
nClusters   = 3;
data        = rand(nPoints,nDimensions) % points from uniform distr.
scatter(data(:,1), data(:,2), 'b')
rndGenSeed  = 1;
%% cluster unshuffled data
rng(rndGenSeed) % set random generator's seed 
[~, clusters] = kmeans(data, nClusters) 
hold on
scatter(clusters(:,1), clusters(:,2), 'rv') % red triangles
hold off
%% cluster shuffled data
rng(rndGenSeed) % set random generator's seed - same seed
[~, clusters_sh] = kmeans(sortrows(data), nClusters) 
hold on
scatter(data(:,1), data(:,2), 'k*') % control - plot shuffeled points - they should be ion same spots
scatter(clusters_sh(:,1), clusters_sh(:,2), 'gv') % these points should cover red triangles
hold off
grid on

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

micholeodon il 12 Mar 2019

Modificato: micholeodon il 12 Mar 2019

I think I have some clue, but it would be highly recommended that somebody from MathWorks Team verify it.

So my clue is this:

Kmeans needs to choose some initial clusters positions. It can select randomly k INPUT POINTS to start.
If you set rng(seed), seed=const. you will always get SAME row indices from data matrix as a starting cluster position.
If you shuffle input data (input points locations are the same, only order in data structure is shuffled), even if you set rng(seed), seed=const. , you will get SAME row indices, BUT points under that indices are DIFFERENT !
That means that kmeans will converge differently for shuffled input data points.

This would explain also my puzzle in another question: https://www.mathworks.com/matlabcentral/answers/448832-bug-evalclusters-is-sensitive-to-rows-points-order

What do you think MathWorks experts? :) Does k-means select input data points as a starting centroids locations?

Accedi per commentare.

Accedi per rispondere a questa domanda.

BUG (#2)? kmeans is sensitive to rows (points) order

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

BUG (#2)? kmeans is sensitive to rows (points) order

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti