Azzera filtri
Azzera filtri

Computation time significantly differs for 2D, 3D array indexing when solving the same problem. Best approach to address performance?

13 visualizzazioni (ultimi 30 giorni)
I wrote a simple code that calculates the covariance of images along different directions and lag spacings and returns a 2D map of covariance values varying by lags(h1,h2). The original program I wrote accepts only 2D inputs:
function [Correlogram] = ExhaustiveCovarianceFunction(X,lags)
[ny,nx,~]=size(X);
Covariance=NaN([2*lags+1 2*lags+1]);
Correlogram=NaN([2*lags+1 2*lags+1]);
for hy=-lags:lags
for hx=-lags:lags
sumprod=0;
sum1=0;
sum2=0;
countpairs=0;
for j=1:ny
for i=1:nx
if hy+j<=0 || hy+j>ny || hx+i<=0 || hx+i>nx
continue
else
sumprod=X(j,i)*X(j+hy,i+hx) + sumprod;
sum1=X(j,i) + sum1;
sum2=X(j+hy,i+hx) + sum2;
countpairs=countpairs+1;
end
end
end
sumprod=sumprod/(countpairs);
sum1=sum1/(countpairs);
sum2=sum2/(countpairs);
Covariance(hy+lags+1,hx+lags+1)=sumprod - sum1*sum2;
%Standardize covariance values to correlogram
mean1=sum1;
mean2=sum2;
sumvar1=0;
sumvar2=0;
for j=1:ny
for i=1:nx
if hy+j<=0 || hy+j>ny || hx+i<=0 || hx+i>nx
continue
else
sumvar1=(X(j,i)-mean1)^2 + sumvar1;
sumvar2=(X(j+hy,i+hx)-mean2)^2 + sumvar2;
end
end
end
sumvar1=sumvar1/countpairs;
sumvar2=sumvar2/countpairs;
Correlogram(hy+lags+1,hx+lags+1)=Covariance(hy+lags+1,hx+lags+1)/sqrt(sumvar1*sumvar2);
end
end
end
I wanted to extend this code to allow for up to 3D image inputs, but also allow 2D or 1D inputs. Some other changes for optimization were made (dynamically updating bounds, removing if/else statement):
function [Correlogram] = ExhaustiveCovarianceFunction3D(X,lags)
[ny,nx,nz]=size(X);
%lags is now a vector
Covariance=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
Correlogram=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
for hz=-lags(3):lags(3)
for hy=-lags(1):lags(1)
for hx=-lags(2):lags(2)
hvector = [hy hx hz];
if hvector(1)<0
ymin=abs(hvector(1))+1;
ymax=ny;
else
ymin=1;
ymax=ny-hvector(1);
end
if hvector(2)<0
xmin=abs(hvector(2))+1;
xmax=nx;
else
xmin=1;
xmax=nx-hvector(2);
end
if hvector(3)<0
zmin=abs(hvector(3))+1;
zmax=nz;
else
zmin=1;
zmax=nz-hvector(3);
end
sumprod=0;
sum1=0;
sum2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumprod=X(j,i,k)*X(j+hy,i+hx,k+hz) + sumprod;
sum1=X(j,i,k) + sum1;
sum2=X(j+hy,i+hx,k+hz) + sum2;
end
end
end
countpairs = numel(ymin:ymax)*numel(xmin:xmax)*numel(zmin:zmax);
sumprod=sumprod/(countpairs);
sum1=sum1/(countpairs);
sum2=sum2/(countpairs);
Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=sumprod - sum1*sum2;
%Standardize covariance values to correlogram
mean1=sum1;
mean2=sum2;
sumvar1=0;
sumvar2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumvar1=(X(j,i,k)-mean1)^2 + sumvar1;
sumvar2=(X(j+hy,i+hx,k+hz)-mean2)^2 + sumvar2;
end
end
end
sumvar1=sumvar1/countpairs;
sumvar2=sumvar2/countpairs;
Correlogram(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)/sqrt(sumvar1*sumvar2);
end
end
end
end
I first compared these codes for a 2D input image (nz=1) and noticed that my new function runs much slower than the original function:
  • ExhaustiveCovarianceFunction(testslice,25): Elapsed time is 0.057659 seconds.
  • ExhaustiveCovarianceFunction3D(testslice,[25,25,0]): Elapsed time is 19.251866 seconds.
After some testing, this led me to realize that indexing a 3D array is much slower than a 2D array in MATLAB, and that this has something to do with how memory is accessed for higher-dimensional arrays. I found out that, if I added a third dimension to the original ExhaustiveCovarianceFunction with index 1 that never changed (eg, X(j,i,1), with no other changes, the code slowed down significantly:
  • 2D array index X(j,i): Elapsed time is 0.059815 seconds.
  • 3D array index X(j,i,1): Elapsed time is 18.849217 seconds.
The idea that we cannot work with higher-dimensional arrays in MATLAB if we want to have optimal performance led me to an approach where, in my new function, I convert the 2D and 3D arrays to a 1D array and map the nD array indices to 1D:
function index1D = convert3Dto1D_index(j, i, k, ny, nx)
index1D = j + (i - 1) * ny + (k - 1) * ny * nx;
end
function [Correlogram] = ExhaustiveCovarianceFunction3D(Xinput,lags)
X = flatten3D(Xinput);
[ny,nx,nz]=size(Xinput);
%lags is now a vector
Covariance=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
Correlogram=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
for hz=-lags(3):lags(3)
for hy=-lags(1):lags(1)
for hx=-lags(2):lags(2)
hvector = [hy hx hz];
if hvector(1)<0
ymin=abs(hvector(1))+1;
ymax=ny;
else
ymin=1;
ymax=ny-hvector(1);
end
if hvector(2)<0
xmin=abs(hvector(2))+1;
xmax=nx;
else
xmin=1;
xmax=nx-hvector(2);
end
if hvector(3)<0
zmin=abs(hvector(3))+1;
zmax=nz;
else
zmin=1;
zmax=nz-hvector(3);
end
sumprod=0;
sum1=0;
sum2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumprod=X(convert3Dto1D_index(j, i, k, ny, nx))*X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sumprod;
sum1=X(convert3Dto1D_index(j, i, k, ny, nx)) + sum1;
sum2=X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sum2;
end
end
end
countpairs = numel(ymin:ymax)*numel(xmin:xmax)*numel(zmin:zmax);
sumprod=sumprod/(countpairs);
sum1=sum1/(countpairs);
sum2=sum2/(countpairs);
Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=sumprod - sum1*sum2;
%Standardize covariance values to correlogram
mean1=sum1;
mean2=sum2;
sumvar1=0;
sumvar2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumvar1=(X(convert3Dto1D_index(j, i, k, ny, nx))-mean1)^2 + sumvar1;
sumvar2=(X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx))-mean2)^2 + sumvar2;
end
end
end
sumvar1=sumvar1/countpairs;
sumvar2=sumvar2/countpairs;
Correlogram(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)/sqrt(sumvar1*sumvar2);
end
end
end
end
This approach has better performance, but is still slower than the original 2D code:
  • 2D: Elapsed time is 0.163243 seconds.
  • 3D, 3D index: Elapsed time is 18.336108 seconds.
  • 3D, 1D index: Elapsed time is 0.431890 seconds.
The optimized code with 1D array indexing is still about 3x slower. Perhaps calling convert3Dto1D_index(j, i, k, ny, nx) every time the image index is called is leading to this expense.These computational concerns are important for larger images (eg, 256x256x128) and for higher-order statistics that will be developed.
Is there a smarter approach to addressing this problem? I cannot figure out how to speed up this 3D code any more. Ideally, I seek to make the 3D/general code to have comparable speeds to the 2D "hard-coded" problem when a 2D image is input. If this is not possible, separate subroutines for 2D and 3D images is likely my best bet.

Risposte (1)

surya venu
surya venu il 17 Giu 2024
Hi,
The performance hit when moving from 2D to 3D indexing, and your subsequent optimization using a 1D indexing approach, is insightful. Here are a few suggestions that could help you further optimize your code or approach the problem differently:
1. Precompute 1D Indices
Instead of calling "convert3Dto1D_index" within your nested loops, you could precompute the 1D indices for all relevant (j, i, k) combinations before entering the loops. This would reduce the computational overhead of repeatedly calling a function to perform the index conversion.
% Precompute 1D indices
indices1D = zeros(ny, nx, nz);
for k = 1:nz
for j = 1:ny
for i = 1:nx
indices1D(j, i, k) = convert3Dto1D_index(j, i, k, ny, nx);
end
end
end
Then, use these precomputed indices directly in your loops.
2. Vectorization Where Possible
MATLAB excels at matrix and vector operations. Wherever possible, try to use vectorized operations instead of loops. For example, computations involving "sumprod", "sum1", and "sum2" might be vectorizable depending on the operations involved. For more information, check out: https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html
3. Logical Indexing for Boundary Conditions
Instead of adjusting "ymin, ymax, xmin, xmax, zmin, and zmax" for boundary conditions, consider using logical indexing or masks that can be applied to your data. This can sometimes simplify the code and reduce conditional checks within the loop. For more information, check out: https://blogs.mathworks.com/steve/2008/01/28/logical-indexing/
4. Compile to MEX
For computationally intensive parts of your code, consider converting them to a MEX function written in C or C++. MEX functions can significantly outperform equivalent MATLAB code due to the compiled nature of C/C++ and the ability to optimize at a lower level. For more information about MEX functions, check out: https://www.mathworks.com/help/matlab/call-mex-functions.html
Hope it helps.
  1 Commento
Corey Hoydic
Corey Hoydic il 17 Giu 2024
Hello Surya,
I was able to address each of your suggestions (minus compiling with MEX). The results are interesting:
1. Precompute 1D Indices
I calculated the indices for all locations in the image before entering any of the nested for loops. Then, I modified my summations from
sumprod=X(convert3Dto1D_index(j, i, k, ny, nx))*X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sumprod;
sum1=X(convert3Dto1D_index(j, i, k, ny, nx)) + sum1;
sum2=X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sum2;
to
sumprod=X(indices1D(j, i, k))*X(indices1D(j+hy, i+hx, k+hz)) + sumprod;
sum1=X(indices1D(j, i, k)) + sum1;
sum2=X(indices1D(j+hy, i+hx, k+hz)) + sum2;
Unfortunately, the performance hit is quite significant:
  • Original code: 0.491754 seconds
  • Precomputed index: 18.954286 seconds
My best guess is that this relates to my original problem that I faced: calling a value from a 3D array is much slower than calling the function convert3Dto1D_index(j, i, k, ny, nx)) each time. I would conclude that it is best to avoid any 3D array indexing (except when vectorizing, see below).
2. Vectorization Where Possible
I vectorized the summation loops in two different ways
Vectorization, 3D (original) index:
k=zmin:zmax;
j=ymin:ymax;
i=xmin:xmax;
sumprod=sum(sum(sum(X(j,i,k).*X(j+hy,i+hx,k+hz))));
sum1=sum(sum(sum(X(j,i,k))));
sum2=sum(sum(sum(X(j+hy,i+hx,k+hz))));
Vectorization, 1D index (indices are pre-computed):
k=zmin:zmax;
j=ymin:ymax;
i=xmin:xmax;
idx_1 = indices1D(j,i,k);
idx_2 = indices1D(j+hy,i+hx,k+hz);
sumprod=sum(sum(sum(X(idx_1).*X(idx_2))));
sum1=sum(sum(sum(X(idx_1))));
sum2=sum(sum(sum(X(idx_2))));
Some benchmarking of results follows. Note that the 2D code is the same as the 3D code, but hard-codes only the first two dimensions X(j,i).
For a 2D image (400x400):
  • 2D code, 1D index: 161.380283 seconds
  • 2D code, 2D index: 12.208195 seconds
  • 2D code, vectorized, 1D index: 78.073599 seconds.
  • 2D code, vectorized, 2D index, 22.776255 seconds
  • 3D code, 1D index: 161.752561 seconds
  • 3D code, 3D index: too slow.. wont bother
  • 3D code, vectorized, 1D index: 78.179389 seconds
  • 3D code, vectorized, 3D index: 24.832623 seconds
For a 3D input (51x51x11):
  • 1D index: 40.293362 seconds
  • 3D index: slow
  • Vectorized, 1D index: 10.849903 seconds
  • Vectorized, 3D index: 3.418844 seconds
Interestingly:
  • The 3D code benefits from vectorization for performance, but the 2D code does not
  • The 1D indexing approach has approximately the same computational load for a 2D image sent through the 2D and 3D codes. This makes sense, since we ditch the 3D array indexing early on.
  • Vectorizing performs better when the original array dimensions are used (don't convert to 1D index).
3. Logical Indexing for Boundary Conditions
I did not address this, but the original version of my code implemented bounds checks and naively iterated over all i,j,k. Removing this and adding the dynamic bounds proved to be faster, since
if hvector(1)<0
ymin=abs(hvector(1))+1;
ymax=ny;
else
ymin=1;
ymax=ny-hvector(1);
end
if hvector(2)<0
xmin=abs(hvector(2))+1;
xmax=nx;
else
xmin=1;
xmax=nx-hvector(2);
end
if hvector(3)<0
zmin=abs(hvector(3))+1;
zmax=nz;
else
zmin=1;
zmax=nz-hvector(3);
end
is rather inexpensive overall.
At this point, it is looking like I am going to want separate subroutines for 2D and 3D inputs, which I wanted to avoid. The 2D code does not benefit from 1D indexing or vectorization. The 3D code benefits from both 1D indexing and vectorization, with the best performance when vectorizing and using the original 3D index. I was hoping to have a general formulation which would perform best (for both 2D and 3D inputs), but I am not yet there. Thank you for the guidance, please let me know if you have any further thoughts.

Accedi per commentare.

Prodotti


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by