explanation of mysterious timing results?
    4 visualizzazioni (ultimi 30 giorni)
  
       Mostra commenti meno recenti
    
In some legacy code, there's a pair of for-loops, one nested in the other. I created two alternate versions, one that tries to optimize by reducing the number of inner loop operations, and one that's fully vectorized. I wrote a timing routine that iterates over all three versions 10,000 times:
nIter = 10000;
tm = zeros(1,3);
for itr = 1:nIter
     % setup variables for all three versions
     NM=18; NM1=NM+1; NM3=NM+3;NM4=NM+4;
     T1=1; T2=2; T3=3;
     i=NM3 : -1 : 1;
     iv(i) = i .* (i-1) / 2;
     [a b] = deal(zeros(1, NM3*NM4/2));
     % V1: original code
     tic;
     for M = 0 : NM
        M1 = M+1;
        RM = M;
        for N = M1 : NM
           RN = N;
           LL = iv(N+1) + M + 1;
           a(LL) = -sqrt(((T2*RN+T1) * (T2*RN-T1)) / ...
                            ((RN-RM) * (RN+RM)));
           b(LL) =  sqrt(((T2*RN+T1) * (RN+RM-T1) * (RN-RM-T1)) /  ...
                         ((T2*RN-T3) *   (RN+RM)  *  (RN-RM)  )   );
        end
     end
     tm(1) = tm(1) + toc;
     % V2: reduced inner loop ops
     tic;
     for M = 0 : NM
        M1 = M+1;
         for N = M1 : NM
           rs  = N+M;
           rd  = N-M;
           rp  = rs*rd;
           rp1 = (rs-1) * (rd-1);
           t2n = 2*N;
           t2n1 = t2n+1;
           LL = iv(N+1) + M1;
           a(LL) = -sqrt(t2n1 * (t2n-1) / rp);
           b(LL) =  sqrt(t2n1 * rp1 / ((t2n-3) * rp));
        end
     end
     tm(2) = tm(2) + toc;
     % V3: vectorized code
     tic;
     sRow = 0 : NM;
     s = sRow(ones(NM1,1),:);
     t = s';         % same as [s,t] = meshgrid(0:NM)
     ltMask = s < t; % mask for array lower triangle below main diag
     rs  = s + t;
     rs  = rs(ltMask);
     rd  = t - s;
     rd  = rd(ltMask);
     rp  = rs .* rd;
     rp1 = (rs-1) .* (rd-1);
     tCol = t(ltMask);
     t2n  = 2*tCol;
     t2n1 = t2n+1;
     i    = iv(tCol+1) + s(ltMask)' + 1;
     a(i) = -sqrt(t2n1 .* (t2n-1) ./ rp);
     b(i) =  sqrt(t2n1 .* rp1 ./ ((t2n-3) .* rp));
     tm(3) = tm(3) + toc;
end % timing loop
descr = {'V1:orig';'V2:reduced ops';'V3:vectorized'};
for i = 1:3
   disp([descr{i} ': ' num2str(tm(i)) ' sec for ' num2str(nIter) ...
         ' iterations']);
end
disp('Normalized times:');
[s loc] = sort(tm);
disp([repmat(char(9),3,1) char(descr(loc)) repmat(' ',3,1) ...
      num2str((s./s(1))', '%0.2f')]);
The vectorized code is currently fastest, by about 2.5 times vs the original. What's very surprising is that the reduced operations version (V2) is reported to be way slower than the other two (by a factor of ~140 vs the vectorized). When I run the Profiler (after disabling all but one CPU), the main culprits are shown to be the two sqrt expressions, with the rp and rp1 computations also eating a good share of time.
The changes I made to get V2 from V1 were: # elimination of two unnecessary variables (RM and RN) # replacement of T1, T2, and T3 with their constant values # addition of some new temporary scalars (rs, rd, rp, rp1, t2n, & t2n1) to allow reduction of the floating-point operation count in the inner loop. (The number of multiplications dropped from 10 to 5, and additions/subractions from 12 to 7).
I'm aware there's some penalty for creating the new scalars, but a factor of 140 seems extreme. Can anyone suggest why these changes have such a high time cost?
2 Commenti
  kjetil87
      
 il 16 Ago 2013
				could you look over your code again so that it is runable with a copy paste?
e.g iv and IV , neither of them are declared as variables before you use them and i suspect maybe you meant for them to be the same?
Also there is some error with the
disp([descr{i} ': ' num2str(tm(i) ' sec for ' num2str(nIter) ...
         ' iterations']);
Risposte (0)
Vedere anche
Categorie
				Scopri di più su Scope Variables and Generate Names in Help Center e File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!