# Fast calculation of distances between two large arrays

5 visualizzazioni (ultimi 30 giorni)
PA il 24 Apr 2023
Commentato: PA il 26 Apr 2023
Dear MATLAB-Community,
I would like to calculate the distances between each entry in M (1 113 486 x 2) and N (1 960 000 x 2) and store the indices for the distances that are within a tolerance value tol. Can someone help me to do that efficiently? The below code takes 90 weeks. I have also tried [~, ind] = ismembertol(M, N, tol) which gives me logical 1 for every pair which does not make sense.
tol=0.5;
indM(size(M,1),1)=NaN;
indN(size(N,1),1)=NaN;
progressbar
for m=1:size(M,1)
for n=1:size(N,1)
if pdist2(M(m,1:2), N(n,1:2)) <= tol
indM(m)=m;
indN(n)=n;
else
indM(m)=NaN;
indN(n)=NaN;
end
end
progressbar(m/size(M,1))
end
Kind regards
Philipp
##### 2 CommentiMostra NessunoNascondi Nessuno
Stephen23 il 24 Apr 2023
Modificato: Stephen23 il 24 Apr 2023
"I have also tried [~, ind] = ismembertol(M, N, tol) which gives me logical 1 for every pair which does not make sense."
If you want to compare rows then you need to specify the ByRows option:
Also note that by default input tol is scaled to the data magnitude: set DataScale to 1 if you want to specify an the actual absolute tolerance.
PA il 24 Apr 2023
Thanks for the answer. Yes, I should have considered this. However, by doing it, it still does not do what I want/expect. Is there another way?

Accedi per commentare.

### Risposta accettata

Chris il 24 Apr 2023
Modificato: Chris il 24 Apr 2023
This should be a little bit quicker (my computer indicates ten hours).
tol = 0.5;
M = rand(1113486,2);
N = rand(1960000,2);
inds = cell(size(N,1),1);
for idx = 1:size(N,1)
close = pdist2(M,N(idx,:)) <= tol;
inds{idx} = find(close);
end
This would be a good candidate for GPU operations, if you have one.
if canUseGPU
tol = 0.5;
M = gpuArray(M);
N = gpuArray(N);
inds = cell(size(N,1),1);
for idx = 1:size(N,1)
close = pdist2(M,N(idx,:)) <= tol;
inds{idx} = find(close);
end
end
If your tolerance is loose relative to the density of your points -- that is, if you have a lot of distances<=tol, you may run into memory issues. In that case, inds should be a tall array.
##### 1 CommentoMostra -1 commenti meno recentiNascondi -1 commenti meno recenti
PA il 26 Apr 2023
This seems to work, thank you very much!

Accedi per commentare.

### Categorie

Scopri di più su Tables in Help Center e File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by