How to assign new data to previous Centroid using K-means
Mostra commenti meno recenti
Hello everyone, I hope you are doing well.
I have written the following code. Now, i am going to apply on new incoming dataset.
for example one data come then i applied K-means, the output is save and the second data come the algorithm check if it belong to that centroid it assign that data to that centroid.
How can i modified the code for new incoming dataset.
%Read Dataset
%Find the Optimal Clusters for this dataset
eva = evalclusters(dataset1,'kmeans','silhouette','KList',[1:10])
K=eva.OptimalK;
%Apply Kmeans to the dataset with Optimal Clusters K
[idx,C,sumdist] = kmeans(dataset,K,'Display','final','Replicates',5);
%Plot the Clusters
figure
gscatter(dataset(:,1),dataset(:,2),idx,'bgmkr')
hold on
plot(C(:,1),C(:,2),'kx')
legend('Cluster 1','Cluster 2','Cluster 3','Cluster 4','Cluster 5','Cluster Centroid')
Risposte (1)
Image Analyst
il 14 Mag 2022
0 voti
Maybe you can just find the distances of all points in your training set from their centroids. Then, for your second set, compute the distances of each point to each of the centroids. Whichever centriod the point is closest to is the cluster it belongs to. You could also use knnsearch for that.
8 Commenti
Stephen john
il 16 Mag 2022
Image Analyst
il 16 Mag 2022
It's so easy you've probably done it by now, but something like this should work:
[rows, columns] = size(testData)
for k = 1 : rows
% Get coordinates of this one test data point.
tx = testData(k, 1);
ty = testData(k, 2);
% Get distance of that one point to all centroid coordinates.
distances = sqrt((tx - C(:, 1)) .^ 2 + (ty - C(:, 2));
% Find out which centroid is closest to this data point
% and assign the closest class to IndexOfClosestClass.
[minDistance, IndexOfClosestClass(k)] = min(distances);
end
Now IndexOfClosestClass will be a vector that has the same number of elements as data points in testData with each element representing the class that the test data point was closest to.
Stephen john
il 17 Mag 2022
Image Analyst
il 17 Mag 2022
OK, so you don't want to see what classes a subsequent "test" set would be classified into, with the classes previously determined from a training set. It seems like what you're saying is that with every new set of data you are going to use it by itself to determine the classes. OK you can do that. Whenever you get a new set of data, simply call kmeans.
Stephen john
il 18 Mag 2022
Image Analyst
il 18 Mag 2022
Why are you confusing me? Now you're back to classifying subsequent sets according to class centroids determined by running kmeans on a prior set. So let's get this straight. You have 5 sets of data (5 sets of 1000 rows each for a total of 5000 rows). You do kmeans to classify the first set. Now, for the second through fourth sets do you want to
- classify each one independently, on it's own with classes determined ONLY by that set, OR
- classify sets 2-5 according to the classification determined by the first set, OR
- classify sets 1-n based on all the data in all those n sets?
Which is it? 1, 2, or 3?
Stephen john
il 19 Mag 2022
Image Analyst
il 19 Mag 2022
I gave you option 2 already. Here it is again:
[rows, columns] = size(testData)
for k = 1 : rows
% Get coordinates of this one test data point.
tx = testData(k, 1);
ty = testData(k, 2);
% Get distance of that one point to all centroid coordinates.
distances = sqrt((tx - C(:, 1)) .^ 2 + (ty - C(:, 2));
% Find out which centroid is closest to this data point
% and assign the closest class to IndexOfClosestClass.
[minDistance, IndexOfClosestClass(k)] = min(distances);
end
Where C is what you got from doing kmeans() on the first set.
Categorie
Scopri di più su k-Means and k-Medoids Clustering in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!