How to vectorize the multiple loops

Hi there,
I'd like some advice on vectorizing the multiple for loop. might you provide me any solid suggestions on how I might try to optimise the loop?
I would want the loop performance to be approximately 5 seconds or less; at the moment, this is a far too large amount.
Thank you very much
% set up the initialization params
a_b = 4;% max 8
h = 1920;% max upto 7680
w = 1080;% max upto 4320
u1 = zeros(1936,1096);
u0 = zeros(1936,1096);
x = int16(zeros(h/a_b,w/a_b));
y = int16(zeros(h/a_b,w/a_b));
% loop over
tic
for r = a_b:a_b:h
rb = floor(r/a_b);
for c = a_b:a_b:w
cb = floor(c/a_b);
c_l = 1e+10;
for u = -a_b:a_b
for v = -a_b:a_b
cv = u1(r+1:r+a_b,c+1:c+ ...
a_b)-u0(r+u+1:r+u+a_b, ...
c+v+1:c+v+a_b);
cv = sum(abs(cv(:)));
if cv < c_l
c_l=cv;
x(rb,cb) = v; y(rb,cb) = u;
end
end
end
end
end
toc;
Elapsed time is 17.232335 seconds.

7 Commenti

Rik
Rik il 14 Lug 2023
It looks like you are attempting some sort of convolution. Using a convolution can be orders of magnitude faster than the equivalent loops. Can you explain what you're trying to do?
And why are there hardly any comments? A year from now you will have no clue whatsover what you did. And nor do we now.
Life is Wonderful
Life is Wonderful il 14 Lug 2023
Modificato: Life is Wonderful il 14 Lug 2023
I'm conducting block windowing across two data frames, with the rows and columns acting as the outside loop, followed by the inner loop for moving the blocks horizontally and vertically. If the cumulative sum is more than a certain threshold, I will replace the threshold with new sum value and copy the inner vectors in the block vector array..
Since I'm having problems processing the loop faster, any suggestions for vectorizing the loop are great.
Thank you
You aren't actually exiting the loop in your code, so currently you could replace your moving sum with a convolution. I would suggest you look into it. Note that conv and friends flip the one input across all dimensions, so the kernel below is to look back one row and one column:
kernel = [...
0 0 0;
0 1 1;
0 1 1];
MovingSum = convn(BigArray,kernel,'same');
Life is Wonderful
Life is Wonderful il 14 Lug 2023
Modificato: Life is Wonderful il 14 Lug 2023
Thanks @Rik,
What kind of performance enhancement do you anticipate here?
Bruno Luong
Bruno Luong il 16 Lug 2023
Modificato: Bruno Luong il 16 Lug 2023
@Rik I fail to see how conv can be applied here.
  • conv somputes the l2 product to two functions, one sliding to another. The correlation(R^2) is proportional to conv result.
  • Here @Life is Wonderful compute the l1 norm of the difference, one sliding to another.
However if the purpose is to find the shift so that u0 match u1, and @Life is Wonderful wants to give up his l1 calculation and use instead the correlation to find the shift, then yes conv is the way to go.
Thanks @Bruno Luong,
Would you kindly share a code sample with us?
@Life is Wonderful are you asking for registration using correlation? I haven't mentioned any code in my previous comment.
I dig out this demo file from an old discussion
PatternMatching()
maxshift = []
maxshift = 1×2
468 701
Slow Matlab CONV2... Correlation = 0.965594 Elapsed time is 19.991994 seconds. Shift found [pixel] is (8,659)
function PatternMatching(GPU)
%% Test data
n = 0; % 0, 100, 500, 1000
switch n
case 0,
im1=imread('bigsur1.jpg');
im2=imread('bigsur2.jpg');
A = sum(double([im1; im2]),3);
A1 = A(1:size(im1,1),:);
A2 = A(size(im1,1)+1:end,:);
case 100,
% Shift between two images (actually of their upper/right corners)
dx = 5; dy = 7;
% Generate cropped images
idx1 = 5:96;
idy1 = 4:84;
idx2 = idx1(1)+dx:64;
idy2 = idy1(1)+dy:90;
case 500,
dx = -13; dy = 27;
% Generate crops
idx1 = 20:460;
idy1 = 15:449;
idx2 = idx1+dx;
idy2 = idy1+dy;
case 1000,
% Shift between two images (actually of their upper/right corners)
dx = -13; dy = 27;
% Generate cropped images
idx1 = 50:960;
idy1 = 35:849;
idx2 = idx1(1)+dx:640;
idy2 = idy1(1)+dy:900;
otherwise
error('Please setup crop-size for n=%d', n)
end
if n>0
fullimg = peaks(n);
noiseptv = 1;
% Crop
A1 = fullimg(idy1,idx1);
A2 = fullimg(idy2,idx2);
% Add noise (uniform pdf)
A1 = A1 + noiseptv*(rand(size(A1))-0.5);
A2 = A2 + noiseptv*(rand(size(A2))-0.5);
% clean up
clear fullimg dx dy idx1 idx2 idy1 idy2
maxshift = [];
else
maxshift = [] %[100 700];
end
% Use GPU flag
if nargin<1
GPU = true;
end
%%
tic
% should return the same input as above
[dx dy] = pmatch(A1, A2, maxshift, GPU);
toc
fprintf('Shift found [pixel] is (%d,%d)\n', dy, dx);
%%
% Graphic check
fig=figure(1);
clf(fig);
ax = axes('Parent',fig);
x1 = 1:size(A1,2);
y1 = 1:size(A1,1);
hold(ax,'on');
if n==0
A1=im1;
A2=im2;
end
imagesc(x1,y1,A1,'Parent',ax);
plot3(ax,x1([1 end end 1 1])+[-1 1 1 -1 -1]/2,...
y1([1 1 end end 1])+[-1 -1 1 1 -1]/2,...
ones(1,5),'k');
x2 = dx + (1:size(A2,2));
y2 = dy + 1:size(A2,1);
imagesc(x2,y2,A2,'Parent',ax);
plot3(ax,x2([1 end end 1 1])+[-1 1 1 -1 -1]/2,...
y2([1 1 end end 1])+[-1 -1 1 1 -1]/2,...
ones(1,5),'k');
% Intersection
x = intersect(x1,x2);
y = intersect(y1,y2);
plot3(ax,x([1 end end 1 1])+[-1 1 1 -1 -1]/2,...
y([1 1 end end 1])+[-1 -1 1 1 -1]/2,...
ones(1,5),'r','LineWidth',1);
set(ax,'Ydir','reverse');
axis(ax,'equal')
end % PatternMatching
%%
function [dx dy] = pmatch(A1, A2, maxshift, GPU)
% function [dx dy] = pmatch(A1, A2, maxshift, GPU)
% Pattern matching engine
% Use GPU
if nargin<3 || isempty(maxshift)
% We don't look for shift larger than this (see ImageAnalyst's post)
% half of the size of A1 and A2
maxshift = ceil(0.8*min(size(A1),size(A2)))
end
% Use GPU
if nargin<4 || isempty(GPU)
GPU = true;
end
if isscalar(maxshift)
% common margin duplicated for both dimensions
maxshift = maxshift([1 1]);
end
% Select 2D convolution engine
if ~isempty(which('convnfft'))
% http://www.mathworks.com/matlabcentral/fileexchange/24504
convfun = @convnfft;
convarg = {[], GPU};
fprintf('GPU/jacket flag = %i\n', GPU);
else
% This one will last almost forever
convfun = @conv2;
convarg = {};
fprintf('Slow Matlab CONV2...\n');
end
% Correlation engine
A2f = A2(end:-1:1,end:-1:1);
C = convfun(A1, A2f, 'full', convarg{:});
V1 = convfun(A1.^2, ones(size(A2f)), 'full', convarg{:});
V2 = convfun(ones(size(A1)), A2f.^2, 'full', convarg{:});
C2 = C.^2 ./ (V1.*V2);
center = size(A2f);
C2 = C2(center(1)+(-maxshift(1):maxshift(1)), ...
center(2)+(-maxshift(2):maxshift(2)));
[cmax ilin] = max(C2(:));
fprintf('Correlation = %g\n', sqrt(cmax))
[iy ix] = ind2sub(size(C2),ilin);
dx = ix - (maxshift(2)+1);
dy = iy - (maxshift(1)+1);
end % pmatch

Accedi per commentare.

 Risposta accettata

Bruno Luong
Bruno Luong il 17 Lug 2023
Modificato: Bruno Luong il 18 Lug 2023
Warning code not tested for correctness:
EDIT correctness is now check
% set up the initialization params
a_b = 4;% max 8
h = 1920;% max upto 7680
w = 1080;% max upto 4320
u1 = rand(1936,1096);
u0 = rand(1936,1096);
x = zeros(h/a_b,w/a_b);
y = zeros(h/a_b,w/a_b);
% loop over
tic
for r = a_b:a_b:h
rb = floor(r/a_b);
for c = a_b:a_b:w
cb = floor(c/a_b);
c_l = 1e+10;
for u = -a_b:a_b
for v = -a_b:a_b
cv = u1(r+1:r+a_b,c+1:c+ ...
a_b)-u0(r+u+1:r+u+a_b, ...
c+v+1:c+v+a_b);
cv = sum(abs(cv(:)));
if cv < c_l
c_l=cv;
x(rb,cb) = v; y(rb,cb) = u;
end
end
end
end
end
toc; % Elapsed time is 27.587217 seconds.
Elapsed time is 17.627810 seconds.
tic
r = a_b:a_b:h;
c = a_b:a_b:w;
r1 = r(1)+1:r(end)+a_b;
c1 = c(1)+1:c(end)+a_b;
U1 = u1(r1,c1);
[m,n] = size(U1);
m = m/a_b;
n = n/a_b;
uv = -a_b:a_b;
p = 2*a_b + 1; % length(uv)
CV = zeros([m n p p]);
for i = 1:p
u = uv(i);
r0 = r1+u;
u00 = u0(r0,:);
for j = 1:p
v = uv(j);
c0 = c1+v;
U0 = u00(:,c0);
DU = abs(U1-U0);
DU = reshape(DU, a_b, m, a_b, n);
DU = sum(DU, [1 3]);
CV(:,:,i,j) = reshape(DU, [m n]);
end
end
[mincv,locmin] = min(CV, [], [3 4], 'linear');
% Edit correct bug here
[~,~,iy,ix] = ind2sub(size(CV),locmin);
y_v = uv(iy);
x_v = uv(ix);
toc % Elapsed time is 1.022439 seconds.
Elapsed time is 0.570510 seconds.
isequal(x,x_v)
ans = logical
1
isequal(y,y_v)
ans = logical
1

3 Commenti

Life is Wonderful
Life is Wonderful il 18 Lug 2023
Modificato: Life is Wonderful il 18 Lug 2023
Thank you , I am looking into your design.
Elapsed time is 1.022439 seconds.
Is there any chance of additional optimisation?
Bruno Luong
Bruno Luong il 18 Lug 2023
Modificato: Bruno Luong il 18 Lug 2023
"Is there any chance of additional optimisation?"
Yes there is a chance, code the algo in assembly language with Nvidia card.
Look, as it is it's 27-30 times faster than your code.
In TMW online server the tic/toc is "Elapsed time is 0.570510 seconds." and you said in the question "I would want the loop performance to be approximately 5 seconds"
it's 8 faster than your goal.
May be at some point you should not be hungry and stop to ask for more.
Salute, Perhaps I need to run a few more tests before declaring that performance is flawless. When I started authoring the code, it was extremely raw code with only the essentials in place. With your recommendation of inserting random numbers, I have more faith that the suggestions would work perfectly. So, thank you for your help.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Conway's Game of Life in Centro assistenza e File Exchange

Prodotti

Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by