Speeding up code: vectorization and others

Question

0 voti

Hi,

Is there a way to speed this up even more? I did what I could already: everything is precomputed, preallocated etc. and compiled to .mex.

First, just to give you an idea about sizes of continers:

bessels = ones(1201, 1201, 101);    % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
maxN = 100;
levels = maxN + 1;
xElements = 1201;
Aj1 = complex(ones(101, 101);
Aj2 = Aj1;

Code:

parfor i = 1 : xElements
    for j = 1 : xElements
        umn = complex(zeros(levels, levels)); % cleaning
        for n = 0:maxN
            mm = 1;
            for m = -n:2:n
                nn = n + 1; % for indexing
                if m < 0
                    umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
                end
                if m > 0
                    umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
                end
                if m == 0
                    umn(nn, mm) = bessels(i, j, nn);
                end
                mm = mm + 1; % for indexing
            end % m
        end % n
        beta1 = sum(sum(Aj1.*umn));
        betaSumSq1(i, j) = abs(beta1).^2;
        beta2 = sum(sum(Aj2.*umn));
        betaSumSq2(i, j) = abs(beta2).^2;
  end % j
end % i

Best regards, Alex

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Alex Kurek il 24 Ago 2016

Modificato: Alex Kurek il 24 Ago 2016

Nobody knows? So far everything I try is not speeding this up.

This if prof Profiler:

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

per isakson il 24 Ago 2016

Modificato: per isakson il 24 Ago 2016

Apri in MATLAB Online

1 voto

Your code choked my computer. R2016a, no parallel toolbox, no mex. However, regarding the innermost loop

for m = -n:2:n
end

Move nn = n + 1; outside the loop. Helps a little bit (tic,toc).
Replace the three if-statements by a if-elseif-elseif-else-end. Again small effect. However, 30% faster when profiling.
Split the loop into two loops one with m<0 and one with m>0 and remove the if-statements. Note: odd and even n. Again small effect. However, when profiling the elapse time was cut in half.

Conclusion: The "JIT/Accelerator" don't need my help in these cases. However, the "JIT/Accelerator" is less effective together with the profiler.

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Alex Kurek il 24 Ago 2016

Modificato: Alex Kurek il 24 Ago 2016

Apri in MATLAB Online

Ok, now it looks like this:

parfor j = 1 : xElements
    for i = 1 : xElements
        umn = complex(zeros(levels, levels)); % cleaning 
        besselIJ = bessels(i, j, :);
        for n = 0:maxN
            nn = n + 1; % for indexing
            currentBessel = besselIJ(nn);
            mm = 1;
            for m = -n:2:n
                  if m < 0
                      umn(nn, mm) = currentBessel * negMcontainer(i, j, abs(m));
                  elseif m == 0
                      umn(nn, mm) = currentBessel;
                  else
                      umn(nn, mm) = currentBessel * posMcontainer(i, j, m) ;
                  end
                  mm = mm + 1;
              end % m
          end % n
          beta1 = sum(sum(Aj1.*umn));
          betaSumSq1(i, j) = abs(beta1).^2;
          beta2 = sum(sum(Aj2.*umn));
          betaSumSq2(i, j) = abs(beta2).^2;
      end % i
  end % j

I have 2016a, mex and parallel and only moving nn = n+1 outside the loop really helped. What helped more, is changing the code co columnwise (first two lines).

One small question: if I split the loop into two to remove if-statements, haw can I efficiently solve the odd/even problem?

Best regards and thanx, Alex

Alex Kurek il 25 Ago 2016

Apri in MATLAB Online

Thanx again. I just coded it:

bessels = ones(1201, 1201, 101);    % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
maxN = 100;
levels = maxN + 1;
xElements = 1201;
Aj1 = complex(ones(101, 101));
Aj2 = Aj1;
betaSumSq1 = zeros(xElements, xElements);
betaSumSq2 = betaSumSq1;
for j = 1 : xElements
    umn = complex(zeros(levels, levels)); % cleaning 
    if rem(j, 25) == 0
        disp(j);
    end
    for i = 1 : xElements
          for n = 0 : 2 : maxN
              nn = n + 1;
              mm = 1;
              for m = - n : 2 : -2
                  umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
                  mm = mm + 1;
              end
              umn(nn, mm) = bessels(i, j, nn);
              for m = 2 : 2 : n
                  umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m) ;
                  mm = mm + 1;
              end
          end
          for n = 1 : 2 : maxN
              nn = n + 1;
              mm = 1;
              for m = - n : 2 : -1
                  umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
                  mm = mm + 1;
              end
              m = 1:2:n;
              numOfEl = ceil(n/2);
              umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
  %             for m = 1 : 2 : n
  %                 umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
  %                 mm = mm + 1;
  %             end
          end
          beta1 = sum(sum(Aj1.*umn));
          betaSumSq1(i, j) = abs(beta1).^2;
          beta2 = sum(sum(Aj2.*umn));
          betaSumSq2(i, j) = abs(beta2).^2;
      end % i
  end % j

The speed in .mex and parfor is exectly the same, as before. But it opens a possibility of vectorization. I vectorized last loop (as pasted aboce) and it slowed down a bit. Have I done something wrong?

Best regards, Alex

per isakson il 27 Ago 2016

Modificato: per isakson il 28 Ago 2016

I don't know what to say. I would have guessed that your vectorization would be faster than the loop. However, loops are much faster in current releases of Matlab than they were when we learned that loops to should always be avoided.

Lesson learned

It's difficult to be smarter than the "JIT/Acceletator"
The function, profile, cannot always be trusted

Alex Kurek il 28 Ago 2016

Apri in MATLAB Online

Thanx. It is possible to speed-up vectorized version by ~17% by doing this:

tic
for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            nn = n + 1;
            numOfEl = ceil(n/2);
            umn2(nn, 1:numOfEl) = bessels(i, j, nn) * posMcontainer(i, j, 1:2:n);
        end
    end
end
toc % 1.059022 seconds

So m is not allocated in every iteration. But still, its 2x slower, than loops.

Accedi per commentare.

Speeding up code: vectorization and others

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Tag

Community Treasure Hunt

Speeding up code: vectorization and others

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

5 Commenti Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Tag

Vedere anche

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti