Fixing the Silhouette Plot (for k-means)?

10 visualizzazioni (ultimi 30 giorni)
I'm working k-means clustering in MATLAB. My file has three coloumns and I have done the codes for clustering. And I need a function to measure the clustering quality, and I pick silhouette plot. I got the silhoutte code from here (and I want it shows like that): http://stackoverflow.com/questions/6644445/equivalent-of-matlabs-cluster-quality-function
And I fit it with my variables. So here it is the k-means clustering code:
load cobat.txt; % read the file
k=input('Enter a number: '); % determine the number of cluster
isRand=0; % 0 -> sequeantial initialization
% 1 -> random initialization
[maxRow, maxCol]=size(cobat);
if maxRow<=k,
y=[m, 1:maxRow];
elseif k>7
h=msgbox('cant more than 7');
else
% initial value of centroid
if isRand,
p = randperm(size(cobat,1)); % random initialization
for i=1:k
c(i,:)=cobat(p(i),:) ;
end
else
for i=1:k
c(i,:)=cobat(i,:); % sequential initialization
end
end
temp=zeros(maxRow,1); % initialize as zero vector
u=0;
while 1,
d=DistMatrix3(cobat,c); % calculate the distance
[z,g]=min(d,[],2); % set the matrix g group
if g==temp, % if the iteration doesn't change anymore
break; % stop the iteration
else
temp=g; % copy the matrix to the temporary variable
end
for i=1:k
f=find(g==i);
if f % calculate the new centroid
c(i,:)=mean(cobat(find(g==i),:),1)
end
end
end
y=[cobat,g]
%plot silhouette
s = mySilhouette(cobat, g)
[~,ord] = sortrows([g s],[1 -2]);
indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});
ytick = cellfun(@(ind) (min(ind)+max(ind))/2, indices);
ytickLabels = num2str((1:K)','%d'); %#'
h = barh(1:N, s(ord),'hist');
set(h, 'EdgeColor','none', 'CData',IDX(ord))
set(gca, 'CLim',[1 K], 'CLimMode','manual')
set(gca, 'YDir','reverse', 'YTick',ytick, 'YTickLabel',ytickLabels)
xlabel('Silhouette Value'), ylabel('Cluster')
%# compare against SILHOUETTE
figure, silhouette(cobat,g)
Here is the DistMatrix3 function (this is used to calculate the distance)
function d=DistMatrix3(A,B)
[hA,wA]=size(A);
[hB,wB]=size(B);
if hA==1 & hB==1
d=sqrt(dot((A-B),(A-B)));
else
C=[ones(1,hB);zeros(1,hB);zeros(1,hB)];
D=[zeros(1,hB);ones(1,hB);zeros(1,hB)];
E=flipud(C);
F=[ones(1,hA);zeros(1,hA);zeros(1,hA)];
G=[zeros(1,hA);ones(1,hA);zeros(1,hA)];
H=flipud(F);
I=A*C;
J=A*D;
K=A*E;
L=B*F;
M=B*G;
N=B*H;
d=sqrt((I-L').^2+(J-M').^2+(K-N').^2);
end
And here is the mySilhouette function code:
function s = mySilhouette(cobat, g)
%# X : matrix of size N-by-p, data where rows are instances
%# IDX: vector of size N, cluster index of each instance (starting from 1)
%# s : vector of size N, silhouette score value of each instance
N = size(cobat,1); %# number of instances
K = numel(unique(g)); %# number of clusters
%# compute pairwise distance matrix
D = squareform( pdist(cobat,'euclidean').^2 );
%# indices belonging to each cluster
kIndices = accumarray(g, 1:N, [K 1], @(x){sort(x)});
%# compute a,b,s for each instance
%# a(i): average distance from i to all other data within the same cluster.
%# b(i): lowest average dist from i to the data of another single cluster
a = zeros(N,1);
b = zeros(N,1);
for i=1:N
ind = kIndices{g(i)}; ind = ind(ind~=i);
a(i) = mean( D(i,ind) );
b(i) = min( cellfun(@(ind) mean(D(i,ind)), kIndices([1:K]~=g(i))) );
end
s = (b-a) ./ max(a,b);
end
Here is cobat file:
65 80 55
45 75 78
36 67 66
65 78 88
79 80 72
77 85 65
76 77 79
65 67 88
85 76 88
56 76 65
I run the code, but it's getting error for: "??? Undefined function or variable 'K'. Error in ==> clustere at 54 indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});"
I know that this is because of the K variable. But I don't have any idea what is K for. And I just can't figure it out. Anyone can help me to fix the error and make it works? You help will be much appreciated.
Thank you.
  2 Commenti
José-Luis
José-Luis il 6 Mag 2013
Modificato: José-Luis il 6 Mag 2013
Have you tried using the debugger?
doc dbstop
What's the value of K when the code fails?
Alvi Syahrin
Alvi Syahrin il 7 Mag 2013
I don't understand why I have to use doc dbstop? See my answer below, I have edited the variables according to my code. But it's still error. Your help will be appreciated, thank you, Jose.

Accedi per commentare.

Risposta accettata

Alvi Syahrin
Alvi Syahrin il 8 Mag 2013
This problem is solved. If you guys have a similiar problem, look at this link: http://stackoverflow.com/questions/16399645/fix-silhouette-plot-for-k-means

Più risposte (1)

Alvi Syahrin
Alvi Syahrin il 7 Mag 2013
Now I have edited the variables according to my code. K becomes k. N becomes maxRow. IDX becomes g. But now I got another error.
"??? Error using ==> accumarray Second input VAL must be a vector with one element for each row in SUBS, or a scalar.
Error in ==> clustere at 56 indices = accumarray(g(ord), 1:k, [k 1], @(x){sort(x)});"
You guys have any idea?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by