How to vectorize the multiple loops

Question

0 voti

Hi there,

I'd like some advice on vectorizing the multiple for loop. might you provide me any solid suggestions on how I might try to optimise the loop?

I would want the loop performance to be approximately 5 seconds or less; at the moment, this is a far too large amount.

Thank you very much

% set up the initialization params
a_b = 4;% max 8
h = 1920;% max upto 7680
w = 1080;% max upto 4320
u1 = zeros(1936,1096);
u0 = zeros(1936,1096);
x = int16(zeros(h/a_b,w/a_b));
y = int16(zeros(h/a_b,w/a_b));
% loop over
tic
for r = a_b:a_b:h
    rb = floor(r/a_b);
    for c = a_b:a_b:w
        cb = floor(c/a_b);
        c_l = 1e+10;
        for u = -a_b:a_b
            for v = -a_b:a_b
                cv = u1(r+1:r+a_b,c+1:c+ ...
                    a_b)-u0(r+u+1:r+u+a_b, ...
                    c+v+1:c+v+a_b);
                cv = sum(abs(cv(:)));
                if cv < c_l
                    c_l=cv;
                    x(rb,cb) = v; y(rb,cb) = u;
                end
            end
        end
    end
end
toc;
Elapsed time is 17.232335 seconds.

7 Commenti
Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

Life is Wonderful il 16 Lug 2023

Thanks @Bruno Luong,

Would you kindly share a code sample with us?

Bruno Luong il 16 Lug 2023

Modificato: Bruno Luong il 16 Lug 2023

Apri in MATLAB Online

@Life is Wonderful are you asking for registration using correlation? I haven't mentioned any code in my previous comment.

I dig out this demo file from an old discussion

PatternMatching()

maxshift = []

maxshift = 1×2

468 701

Slow Matlab CONV2... Correlation = 0.965594 Elapsed time is 19.991994 seconds. Shift found [pixel] is (8,659)

function PatternMatching(GPU)

%% Test data

n = 0; % 0, 100, 500, 1000

switch n

case 0,

im1=imread('bigsur1.jpg');

im2=imread('bigsur2.jpg');

A = sum(double([im1; im2]),3);

A1 = A(1:size(im1,1),:);

A2 = A(size(im1,1)+1:end,:);

case 100,

% Shift between two images (actually of their upper/right corners)

dx = 5; dy = 7;

% Generate cropped images

idx1 = 5:96;

idy1 = 4:84;

idx2 = idx1(1)+dx:64;

idy2 = idy1(1)+dy:90;

case 500,

dx = -13; dy = 27;

% Generate crops

idx1 = 20:460;

idy1 = 15:449;

idx2 = idx1+dx;

idy2 = idy1+dy;

case 1000,

% Shift between two images (actually of their upper/right corners)

dx = -13; dy = 27;

% Generate cropped images

idx1 = 50:960;

idy1 = 35:849;

idx2 = idx1(1)+dx:640;

idy2 = idy1(1)+dy:900;

otherwise

error('Please setup crop-size for n=%d', n)

end

if n>0

fullimg = peaks(n);

noiseptv = 1;

% Crop

A1 = fullimg(idy1,idx1);

A2 = fullimg(idy2,idx2);

% Add noise (uniform pdf)

A1 = A1 + noiseptv*(rand(size(A1))-0.5);

A2 = A2 + noiseptv*(rand(size(A2))-0.5);

% clean up

clear fullimg dx dy idx1 idx2 idy1 idy2

maxshift = [];

else

maxshift = [] %[100 700];

end

% Use GPU flag

if nargin<1

GPU = true;

end

%%

tic

% should return the same input as above

[dx dy] = pmatch(A1, A2, maxshift, GPU);

toc

fprintf('Shift found [pixel] is (%d,%d)\n', dy, dx);

%%

% Graphic check

fig=figure(1);

clf(fig);

ax = axes('Parent',fig);

x1 = 1:size(A1,2);

y1 = 1:size(A1,1);

hold(ax,'on');

if n==0

A1=im1;

A2=im2;

end

imagesc(x1,y1,A1,'Parent',ax);

plot3(ax,x1([1 end end 1 1])+[-1 1 1 -1 -1]/2,...

y1([1 1 end end 1])+[-1 -1 1 1 -1]/2,...

ones(1,5),'k');

x2 = dx + (1:size(A2,2));

y2 = dy + 1:size(A2,1);

imagesc(x2,y2,A2,'Parent',ax);

plot3(ax,x2([1 end end 1 1])+[-1 1 1 -1 -1]/2,...

y2([1 1 end end 1])+[-1 -1 1 1 -1]/2,...

ones(1,5),'k');

% Intersection

x = intersect(x1,x2);

y = intersect(y1,y2);

plot3(ax,x([1 end end 1 1])+[-1 1 1 -1 -1]/2,...

y([1 1 end end 1])+[-1 -1 1 1 -1]/2,...

ones(1,5),'r','LineWidth',1);

set(ax,'Ydir','reverse');

axis(ax,'equal')

end % PatternMatching

%%

function [dx dy] = pmatch(A1, A2, maxshift, GPU)

% function [dx dy] = pmatch(A1, A2, maxshift, GPU)

% Pattern matching engine

% Use GPU

if nargin<3 || isempty(maxshift)

% We don't look for shift larger than this (see ImageAnalyst's post)

% half of the size of A1 and A2

maxshift = ceil(0.8*min(size(A1),size(A2)))

end

% Use GPU

if nargin<4 || isempty(GPU)

GPU = true;

end

if isscalar(maxshift)

% common margin duplicated for both dimensions

maxshift = maxshift([1 1]);

end

% Select 2D convolution engine

if ~isempty(which('convnfft'))

% http://www.mathworks.com/matlabcentral/fileexchange/24504

convfun = @convnfft;

convarg = {[], GPU};

fprintf('GPU/jacket flag = %i\n', GPU);

else

% This one will last almost forever

convfun = @conv2;

convarg = {};

fprintf('Slow Matlab CONV2...\n');

end

% Correlation engine

A2f = A2(end:-1:1,end:-1:1);

C = convfun(A1, A2f, 'full', convarg{:});

V1 = convfun(A1.^2, ones(size(A2f)), 'full', convarg{:});

V2 = convfun(ones(size(A1)), A2f.^2, 'full', convarg{:});

C2 = C.^2 ./ (V1.*V2);

center = size(A2f);

C2 = C2(center(1)+(-maxshift(1):maxshift(1)), ...

center(2)+(-maxshift(2):maxshift(2)));

[cmax ilin] = max(C2(:));

fprintf('Correlation = %g\n', sqrt(cmax))

[iy ix] = ind2sub(size(C2),ilin);

dx = ix - (maxshift(2)+1);

dy = iy - (maxshift(1)+1);

end % pmatch

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Bruno Luong il 17 Lug 2023

Modificato: Bruno Luong il 18 Lug 2023

Apri in MATLAB Online

1 voto

Warning code not tested for correctness:

EDIT correctness is now check

% set up the initialization params
a_b = 4;% max 8
h = 1920;% max upto 7680
w = 1080;% max upto 4320
u1 = rand(1936,1096);
u0 = rand(1936,1096);
x = zeros(h/a_b,w/a_b);
y = zeros(h/a_b,w/a_b);
% loop over
tic
for r = a_b:a_b:h
    rb = floor(r/a_b);
    for c = a_b:a_b:w
        cb = floor(c/a_b);
        c_l = 1e+10;
        for u = -a_b:a_b
            for v = -a_b:a_b
                cv = u1(r+1:r+a_b,c+1:c+ ...
                    a_b)-u0(r+u+1:r+u+a_b, ...
                    c+v+1:c+v+a_b);
                cv = sum(abs(cv(:)));
                if cv < c_l
                    c_l=cv;
                    x(rb,cb) = v; y(rb,cb) = u;
                end
            end
        end
    end
end
toc; % Elapsed time is 27.587217 seconds.
Elapsed time is 17.627810 seconds.
tic
r = a_b:a_b:h;
c = a_b:a_b:w;
r1 = r(1)+1:r(end)+a_b;
c1 = c(1)+1:c(end)+a_b;
U1 = u1(r1,c1);
[m,n] = size(U1);
m = m/a_b;
n = n/a_b;
uv = -a_b:a_b;
p = 2*a_b + 1; % length(uv)
CV = zeros([m n p p]);
for i = 1:p
    u = uv(i);
    r0 = r1+u;
    u00 = u0(r0,:);
    for j = 1:p
        v = uv(j);
        c0 = c1+v;
        U0 = u00(:,c0);
        DU = abs(U1-U0);
        DU = reshape(DU, a_b, m, a_b, n);
        DU = sum(DU, [1 3]);
        CV(:,:,i,j) = reshape(DU, [m n]);
    end
end
[mincv,locmin] = min(CV, [], [3 4], 'linear');
% Edit correct bug here
[~,~,iy,ix] = ind2sub(size(CV),locmin);
y_v = uv(iy);
x_v = uv(ix);
toc % Elapsed time is 1.022439 seconds.
Elapsed time is 0.570510 seconds.
isequal(x,x_v)
ans = logical
   1
isequal(y,y_v)
ans = logical
   1

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente

Bruno Luong il 18 Lug 2023

Modificato: Bruno Luong il 18 Lug 2023

"Is there any chance of additional optimisation?"

Yes there is a chance, code the algo in assembly language with Nvidia card.

Look, as it is it's 27-30 times faster than your code.

In TMW online server the tic/toc is "Elapsed time is 0.570510 seconds." and you said in the question "I would want the loop performance to be approximately 5 seconds"

it's 8 faster than your goal.

May be at some point you should not be hungry and stop to ask for more.

Life is Wonderful il 18 Lug 2023

Salute, Perhaps I need to run a few more tests before declaring that performance is flawless. When I started authoring the code, it was extremely raw code with only the essentials in place. With your recommendation of inserting random numbers, I have more faith that the suggestions would work perfectly. So, thank you for your help.

Accedi per commentare.

How to vectorize the multiple loops

7 Commenti
Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

Risposta accettata

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente

Più risposte (0)

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

How to vectorize the multiple loops

7 Commenti Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

Risposta accettata

3 Commenti Mostra 1 commento meno recente Nascondi 1 commento meno recente

Più risposte (0)

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

7 Commenti
Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente