Cluster within percentage of data

Saurav Agarwal

30 Lug 2013

0 Risposte

9 Visualizzazioni (30 giorni)

Accedi per rispondere a questa domanda.

Follow Question

Accedi per rispondere a questa domanda.

Follow Question

Mostra commenti meno recenti

0 voti

Hi, I intend to form clusters from a data of 4 variables. These 4 variables are design parameters and each row of the data (200,000 X 4 matrix) correspond to a particular design. I wanted the clusters to be formed in such a way that the similar designs are clubbed together and we can deal with the centroid of the clusters instead of dealing with all the data sets. However, the kmeans cluster use euclidean distance to cluster. This does not serve the purpose as it would put (1,1000) and (1000,1) into the same cluster, in case of 2 variable format. The two designs would be completely different.

What I wanted was that the cluster contains the data sets which are x% of the variable values at the centroid of each cluster. Let's say we have a cluster with centroid (20,10,100,50), then all the data sets in the cluster should be (20+-2,10+-1,100+-10,50+-5) for x=10%. I couldn't find any method in the cluster analysis which could serve the above purpose. Please let me know if the logic I am trying to follow is flawed

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Saurav Agarwal il 2 Ago 2013

@Image I formed 60 clusters when using kmeans. However, I am not happy with the Euclidean method. I am rather trying to implement the relative distance. However, I am not able to progress.

Jing il 28 Ago 2013

So you know the centroid of each cluster or not? If you know the centroid, I'd say KNNSEARCH could help better. Anyway, your idea looks like a new algorithm to me...

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question