How do you calculate the minimum circle within a cluster set of data points to potentially identify the ones that are most similar?

22 visualizzazioni (ultimi 30 giorni)
I have made a PCA plot and I would like to calculate the minimum circle within a cluster set of data points to potentially identify the ones that are most similar, how can I do that?

Risposta accettata

John D'Errico
John D'Errico il 14 Feb 2023
Let me give an example.
XY = [randn(20,2)*1 + [-5,5];randn(10,2)*1.5 + [3 6];randn(10,2)*1 + [-7 -4]];
plot(XY(:,1),XY(:,2),'o')
How would you draw circles around what appear to be three groups of points here? As I have said many times, I would jsut use kmeans. I know there are three groups. This is a serious problem in any clustering tool, since you need to know how many clusters there are. Lacking that, the problem becomes MUCH more difficult. Again, these are things the human eye/brain combination does very naturally.
clustidx = kmeans(XY,3);
C = [1 0 0;0 1 0;0 0 1];
scatter(XY(:,1),XY(:,2),[],C(clustidx,:))
So k-means has done very nicely. But it really does need to know the number of clusters to search for.
And, now we can draw bounding circles.
hold on
t = linspace(0,2*pi);
C = zeros(3,2);
R = zeros(3,1);
colors = 'rgb';
for i = 1:3
ind = find(clustidx == i);
[C(i,:),R(i)] = minboundcircle(XY(ind,1),XY(ind,2));
plot(C(i,1) + R(i)*cos(t),C(i,2) + R(i)*sin(t),['-',colors(i)])
end
hold off
Now, could I have done the clustering in a different way? Possibly. One idea might be to use graph theoretic tools in MATLAB.
For example, on the same set of data, start with tihe set of interpoint distances.
D = squareform(pdist(XY));
Now, if there are truly 3 distinct clusters, then roughly 66% of the distances between points will be between points from different clusters. So I'll zero out 75% of the distances.
D(D > prctile(D(:),25)) = 0;
Next, create a graph of what remains.
GD = graph(D)
GD =
graph with properties: Edges: [180×2 table] Nodes: [40×0 table]
plot(GD)
The conncomp function will identify the sets of points that are connected to each other.
conncomp(GD)
ans = 1×40
1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3
As you can see, it did indeed identify three main clusters of points, though one point was disconnected from the rest, effectively seen as an outlier. Regardless, I could now have built the same plot as above, using these connected components.
  2 Commenti
Neo
Neo il 21 Feb 2023
figure(1)
clustidx = kmeans(C,3);
J = [1 0 0;0 1 0;0 0 1];
scatter(C(:,1),C(:,2),[],J(clustidx,:))
hold on
t = linspace(0,2*pi);
J = zeros(3,2);
R = zeros(3,1);
colors = 'rgb';
for i = 1:3
ind = find(clustidx == i);
[J(i,:),R(i)] = minboundcircle(C(ind,1),C(ind,2));
plot(J(i,1) + R(i)*cos(t),J(i,2) + R(i)*sin(t),['-',colors(i)])
end
hold off
What i intended my code to do was perform k means on my data matrix C then draw the min bounding circles, here is what the code output.
Was it succesful? Then I can be done with this question, thanks for your help.
Laura
Laura il 22 Feb 2023
Sorry I am curious if this code worked as depicted since I have a similar question. Is the output accurate? Thank you from Australia!

Accedi per commentare.

Più risposte (2)

Image Analyst
Image Analyst il 13 Feb 2023
Modificato: Josh Natanson il 14 Feb 2023
You have the classes to which the cluster is assigned. But I'm not sure, for a particular cluster, what "the minimum circle within a cluster set of data points" means. Do you want to find the pair of points in each cluster that are closest to each other? Do you want to find the minimum containing/bounding circle for each cluster? Do you want the size of each cluster's bounding circle? Please clarify. How many PCs do you have? How many clusters do you have?
See
  16 Commenti

Accedi per commentare.


John D'Errico
John D'Errico il 14 Feb 2023
Modificato: John D'Errico il 14 Feb 2023
How many times will you ask the same question? You keep on talking about a minimal circle. Sometimes it is to draw a minimal circle. And sometimes you seem to be confusing different algorithms. (Why I pretty much gave up on responding to your posts. Sorry, but I did.) This time it is to use the minimal circle to identify a cluster of similar points, but you don't say how the cluster would be identified.
The problem is, your question does not seem to understand there are several issues here. If you have a cluster of points, you can trivially find the minimal bounding circle. But a mimimal bounding circle algorithm is not a clustering tool. So you cannot use that bounding circle code to find a cluster of points that you have not first identified.
I really am sorry, but it just seems as if you want to do something, but have no idea what is involved in doing what you want. So you just keep on asking the same question over and over again.
So, you have some points (currently in 2-d, the result of a PCA.) They may be pretty sparse, at least from the pictures drawn, which makes EVERYTHING more difficult. And what your eye sees are several groups of points. What you want is some automatic code that will draw a circle around the various clusters that your eye sees. This seems to be the gist of the last 3 or more questions that you have asked. The problem is, you need to use the tools we have been suggesting, at least if you want to do this in a way that is automatic.
First, perform a clustering analysis. There are MANY clustering algorithms available, but kmeans has some of the most commonly used tools.
Or, you could simply choose the clusters for any similar group of points yourself. For example, I have posted code on the file exchange (selectdata) that will allow a user to select groups of points using a lasso. Just draw a curve around the points in one group, and it will tell you which points they were. Or you can tell the code to use a circular shape, then you click on the figure, and expand that circle until it contains the points you want. The idea is, you surely know what you want to see. So do it using a mouse. But if not, then you NEED TO USE A CLUSTERING TOOL. A minimal bounding circle is not a tool for clustering.
Can I explain why a minimal bounding circle does not work as a tool for clustering? That might be a good question. The point is a bounding circle is almost always controlled by 3 points, and sometimes rarely only 2 points. If you drop one or more of those points, the minimal bounding circle changes dramatically. This makes the bounding circle rather an unstable thing, since it does not depend on any of the other data points inside the circle. And that means you cannot use it to perform clustering. You need to use some other scheme first. PLEASE READ AND UNDERSTAND THIS LAST PARAGRAPH. A bounding circle is not in itself a clustering tool, nor can it be made to work as one. At least not well.
Once you have the points identified in any one similar cluster, then use a tool as I have shown to find the circle and then draw it. I don't see the problem, or why this requires so many repeated questions on the topic.
Again, break down the problem into smaller ones. First you need to use clustering though. We dont know enoiugh about your problem to know which clustering scheme might be best, but very likely just selecting the points with a mouse might be the best.
  2 Commenti
Neo
Neo il 14 Feb 2023
@John D'Errico to clarify I want to use the minimal circle to identify a cluster of similar points, the cluster would be identified by PCA. (I understand it just reduces the dimensionality and does not know which values belong where).
Since I want to use the automatic way I have investigated PCA and T-SNE as my clustering algorithms and now want to draw the smallest cirlce that can identify automatically the closest points in relevance. So I guess that brings me back to my question that I thought i asked (sorry english is one of languages I speak) which is how to draw minimal curcle that automatically identifies a cluster of similar points by their values.
John D'Errico
John D'Errico il 14 Feb 2023
But again, you cannot do that in any stable way. A minimal circle is a poor thing to identify a cluster of points, because that minimal circle is ALWAYS based on only a few points in any set. You really don't want to do exactly what it is you think you want to do. For example:
xy = randn(20,2);
[C,R] = minboundcircle(xy(:,1),xy(:,2))
C = 1×2
0.0982 0.5052
R = 2.4690
t = linspace(0,2*pi,100);
plot(xy(:,1),xy(:,2),'bo',R*cos(t) + C(1), R*sin(t) + C(2),'-r')
axis equal
Do you see that only 3 points ALWAYS control the position and size of the circle? In some cases, only 2 points may control the circle. And if you were to delete one or more of those points, then the circle changes in a very unstable, totally unpredictable way. The rest of the points inside that circle have no impact at all on the circle itself.
You cannot use a minimal bounding circle to perform clustering. You CAN use it AFTER you have performed the clustering. But not to do the clustering itself. And I know that when you look at a set of points in a picture, you see a circle that contains them. So you think the circle can be made to do what you want. But your eye/brain combination can do many things that are exceedingly difficult to do using a computer, and to do it robustly.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by