kmeans appear to miss obvious clusters

Question

0 voti

data.mat

I am struggling to get kmeans to identify what appear to be fairly distinct clusters in my data. I've walked through the documentation and examples but can't improve over the images shown below (raw data plotted first followed by the kmeans result, data also attached). I've tried the different distance and start options without much success. Even giving seed values doesn't improve the clustering. Does anyone have any other suggestions I could try? My goal is to end up with each data point falling into one of 3 clusters. My last command was:

[cidx3,cmeans2] = kmeans(X,3,'dist','cosine','display','iter','Start',seeds);

where

seeds = 
[0.018660  872  17.59;
0.002100  1140  18.88;
0.004652  1187  34.82]

Thank you

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Adam il 7 Lug 2017

It would help if you plotted the seeds visibly on the graph. It's not very easy to see where a point is in 3d just from coordinates.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Ilya il 7 Lug 2017

Apri in MATLAB Online

1 voto

Do this (assuming there are no nan's in X):

[cidx3,cmeans2] = kmeans(zscore(X),3,'dist','cosine','display','iter');

Did it get better? If yes, look at your data again and think about what went wrong in your previous attempts. Look at the scales. Plot it using real scales 1:1. Think about how the cosine distance works when the data are shifted far away from zero.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

NCramer il 10 Lug 2017

Using zscore to normalize the data did help significantly. Thank you (and Image Analyst) very much for that suggestion. There is still a fair amount of bleeding of the main cluster into the smaller ones but I will play around with other ways to normalize the data and see if that helps.

Accedi per commentare.

Answer 2

Image Analyst il 7 Lug 2017

1 voto

You might want to normalize your data.

I don't think it's good to try to find clusters when one parameter goes from 0 to 1500 and another goes from 0 to 0.05 !!!

With these ranges, your data is basically in a skinny flat sheet, not a 3-D widely spread out space.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

NCramer il 10 Lug 2017

Thank you, Image Analyst. Normalizing with zscore does certainly help.

Accedi per commentare.

kmeans appear to miss obvious clusters

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (1)

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Categorie

Prodotti

Tag

Community Treasure Hunt

kmeans appear to miss obvious clusters

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (1)

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Categorie

Prodotti

Tag

Vedere anche

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti