Mahalanobis distance in matlab: pdist2() vs. mahal() function

6 visualizzazioni (ultimi 30 giorni)
I have two matrices X and Y. Both represent a number of positions in 3D-space. X is a 50*3 matrix, Y is a 60*3 matrix. My question is why applying the mean-function over the output of pdist2() in combination with 'Mahalanobis' does not give the result obtained with mahal(). More details on what I'm trying to do below, as well as the code I used to test this.
Let's suppose the 60 observations in matrix Y are obtained after an experimental manipulation of some kind. What I'm trying to do is to assess whether this manipulation had a significant effect on the positions observed in Y. Therefore, I used pdist2(X,X,'Mahalanobis') to compare X to X to obtain a baseline, and later, X to Y (with X the reference matrix: pdist2(X,Y,'Mahalanobis')), and I plotted both distributions to have a look at the overlap. Subsequently, I calculated the mean Mahalanobis distance for both distributions and the 95% CI and did a t-test and Kolmogorov-Smirnoff test to asses if the difference between the distributions was significant. This seemed very intuitive to me, however, when testing with mahal(), I get different values, although the reference matrix is the same. I don't get what the difference between both ways of calculating mahalanobis distance is exactly.
% test pdist2 vs. mahal in matlab
% the purpose of this script is to see whether the average over the rows of E equals the values in d...
% data X = []; % 50*3 matrix, data omitted Y = []; % 60*3 matrix, data omitted
% calculations S = nancov(X);
% mahal() d = mahal(Y,X); % gives an 60*1 matrix with a value for each Cartesian element in Y (second matrix is always the reference matrix)
% pairwise mahalanobis distance with pdist2() E = pdist2(X,Y,'mahalanobis',S); % outputs an 50*60 matrix with each ij-th element the pairwise distance between element X(i,:) and Y(j,:) based on the covariance matrix of X: nancov(X) %{ so this is harder to interpret than mahal(), as elements of Y are not just compared to the "mahalanobis-centroid" based on X, % but to each individual element of X % so the purpose of this script is to see whether the average over the rows of E equals the values in d... %}
F = mean(E); % now I averaged over the rows, which means, over all values of X, the reference matrix
mean(d) mean(E(:)) % not equal to mean(d) d-F' % not zero
% plot output figure(1) plot(d,'bo'), hold on plot(mean(E),'ro') legend('mahal()','avaraged over all x values pdist2()') ylabel('Mahalanobis distance')
figure(2) plot(d,'bo'), hold on plot(E,'ro') plot(d,'bo','MarkerFaceColor','b') xlabel('values in matrix Y (Yi) ... or ... pairwise comparison Yi. (Yi vs. all Xi values)') ylabel('Mahalanobis distance') legend('mahal()','pdist2()')

Risposta accettata

babi psylon
babi psylon il 12 Nov 2013
An attempt to answer my own question, while adding a new question:
Well, I guess there are two different ways to calculate mahalanobis distance between two clusters of data like you explain above: 1) you compare each data point from your sample set to mu and sigma matrices calculated from your reference distribution (although labeling one cluster sample set and the other reference distribution may be arbitrary), thereby calculating the distance from each point to this so called mahalanobis-centroid of the reference distribution. 2) you compare each datapoint from matrix Y to each datapoint of matrix X, with, X the reference distribution (mu and sigma are calculated from X only)
The values of the distances will be different, but I guess the ordinal order of dissimilarity between clusters is preserved when using either method 1 or 2? I actually wonder when comparing 10 different clusters to a reference matrix X, or to each other, if the order of the dissimilarities would differ using method 1 or method 2? Also, I can't imagine a situation where one method would be wrong and the other method not. Although method 1 seems more intuitive in some situations, like mine.

Più risposte (0)

Categorie

Scopri di più su Statistics and Machine Learning Toolbox in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by