How to assign new data to previous Centroid using K-means

Hello everyone, I hope you are doing well.
I have written the following code. Now, i am going to apply on new incoming dataset.
for example one data come then i applied K-means, the output is save and the second data come the algorithm check if it belong to that centroid it assign that data to that centroid.
How can i modified the code for new incoming dataset.
%Read Dataset
%Find the Optimal Clusters for this dataset
eva = evalclusters(dataset1,'kmeans','silhouette','KList',[1:10])
K=eva.OptimalK;
%Apply Kmeans to the dataset with Optimal Clusters K
[idx,C,sumdist] = kmeans(dataset,K,'Display','final','Replicates',5);
%Plot the Clusters
figure
gscatter(dataset(:,1),dataset(:,2),idx,'bgmkr')
hold on
plot(C(:,1),C(:,2),'kx')
legend('Cluster 1','Cluster 2','Cluster 3','Cluster 4','Cluster 5','Cluster Centroid')

Risposte (1)

Maybe you can just find the distances of all points in your training set from their centroids. Then, for your second set, compute the distances of each point to each of the centroids. Whichever centriod the point is closest to is the cluster it belongs to. You could also use knnsearch for that.

8 Commenti

@Image Analyst how can i change the above code for that?
It's so easy you've probably done it by now, but something like this should work:
[rows, columns] = size(testData)
for k = 1 : rows
% Get coordinates of this one test data point.
tx = testData(k, 1);
ty = testData(k, 2);
% Get distance of that one point to all centroid coordinates.
distances = sqrt((tx - C(:, 1)) .^ 2 + (ty - C(:, 2));
% Find out which centroid is closest to this data point
% and assign the closest class to IndexOfClosestClass.
[minDistance, IndexOfClosestClass(k)] = min(distances);
end
Now IndexOfClosestClass will be a vector that has the same number of elements as data points in testData with each element representing the class that the test data point was closest to.
@Image Analyst but my test data is coming continously , i want to buffer the incoming data for example in real time 1000 samples are received. then apply K-means to the dataset.
and saved result of that 1000 samples, for next iteration it wait for 1000 samples and then apply K-means again
OK, so you don't want to see what classes a subsequent "test" set would be classified into, with the classes previously determined from a training set. It seems like what you're saying is that with every new set of data you are going to use it by itself to determine the classes. OK you can do that. Whenever you get a new set of data, simply call kmeans.
@Image Analyst No, i want to apply on new Test dataset
i have the dataset of shape 5000x4,
I want to apply K-means on first 1000 samples then wait for next 1000 to complete and use the centroid of Previous K-means and recluster the dataset. same goes for other method too.
Why are you confusing me? Now you're back to classifying subsequent sets according to class centroids determined by running kmeans on a prior set. So let's get this straight. You have 5 sets of data (5 sets of 1000 rows each for a total of 5000 rows). You do kmeans to classify the first set. Now, for the second through fourth sets do you want to
  1. classify each one independently, on it's own with classes determined ONLY by that set, OR
  2. classify sets 2-5 according to the classification determined by the first set, OR
  3. classify sets 1-n based on all the data in all those n sets?
Which is it? 1, 2, or 3?
@Image Analyst I want to classify data using option 2.
I gave you option 2 already. Here it is again:
[rows, columns] = size(testData)
for k = 1 : rows
% Get coordinates of this one test data point.
tx = testData(k, 1);
ty = testData(k, 2);
% Get distance of that one point to all centroid coordinates.
distances = sqrt((tx - C(:, 1)) .^ 2 + (ty - C(:, 2));
% Find out which centroid is closest to this data point
% and assign the closest class to IndexOfClosestClass.
[minDistance, IndexOfClosestClass(k)] = min(distances);
end
Where C is what you got from doing kmeans() on the first set.

Accedi per commentare.

Richiesto:

il 14 Mag 2022

Commentato:

il 19 Mag 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by