How to improve speed of calculating trace in a script?

Hi all,
In my project I have to calculate the trace of some matrix products, I have the following script to demonstrate the purpose:
clear; clc;
% number of total tests.
nTest = 500;
% part 1. generate nTest*2 random matrices.
nd = 1000;
nt = 100;
dis = cell(nTest, 2);
dis = cellfun(@(v) rand(nd, nt), dis, 'un', 0);
% part 2. perform truncated-SVD on each matrix,
% only leave nRem singular vectors and values.
nRem = 50;
disSVD = cell(nTest, 2);
for isvd = 1:nTest
for jsvd = 1:2
[u, s, v] = svd(dis{isvd, jsvd}, 0);
disSVD{isvd, jsvd} = {u(:, 1:nRem), s(1:nRem, 1:nRem), v(:, 1:nRem)};
end
end
% part 3. for each SVD result, perform trace to obtain disTrans. disTrans is
% non-symmetric, thus jtr needs to start from 1.
disTrans = zeros(nTest);
for itr = 1:nTest
u1 = disSVD{itr, 1};
for jtr = 1:nTest
u2 = disSVD{jtr, 2};
disTrans(itr, jtr) = ...
trace(u1{3} * u1{2}' * u1{1}' * u2{1} * u2{2} * u2{3}');
end
end
I ran profile to find out which part is the slowest, it turns out it's calculating the trace in part 3 due to the large number. Unfortunately in my project the number of calculating trace is also very large. So any idea of how to improve the speed of calculating the trace? The profile is shown here:
Many thanks!

8 Commenti

Xiaohan - in your call to trace
trace(u1{3} * u1{2}' * u1{1}' * u2{1} * u2{2} * u2{3}');
note how for every iteration of the inner loop, you are always repeating the product
u1{3} * u1{2}' * u1{1}'
I suppose you could put that in the outer loop to see if that improves your processing time.
disTrans = zeros(nTest);
for itr = 1:nTest
u1 = disSVD{itr, 1};
z = u1{3} * u1{2}' * u1{1}';
for jtr = 1:nTest
u2 = disSVD{jtr, 2};
disTrans(itr, jtr) = ...
trace(z * u2{1} * u2{2} * u2{3}');
end
end
Hi Geoff, thanks! Withnyour suggestion the profiler becomes:
which improved a lot in total time. However there is something I don't understand -- in the second line it shows 'trace' is called 250000 times, total time = self time = 2.533s, why? If so should the total time of the script 'testuiTujSortImprove' be 2.533 * 250000 = 633250s? or total time of trace means it takes 2.533s to run it for 250000 times?
Hi Xiaohan - my understanding is that trace is called 250000 times and spends a total of 2.533 seconds for all of the 250000 calls.
The total time spent in testuiTujSortImprove is 64.596 seconds of which 2.5333 is spent in trace. The self time is the time spent in that function. That is why when we made the above change to perform that multiplication in the outer loop, we improved the self time of your main function.
Xiaohan Du
Xiaohan Du il 3 Apr 2018
Modificato: Xiaohan Du il 3 Apr 2018
This 64.596 is a sum of self time of each function, but why is self time of the main script 61.062s? Sum over all the other functions is less than 5 secs.
well the 61.062 seconds should be the time spent in the testuiTujSortImprove function itself and does not include the time spent in any function (like trace) that it calls.
64.596 - 61.062 ~= 3.53 should be the time spent in all of the other functions.
Sorry I'm still confused with the profiler, my question is:
The self-time of 'testuiTujSortImprove', 'trace', 'rand(nd, nt)' are 61.062s, 2.533s, 1.001s, respectively, and the rest functions all cost less than 0.013s. I think what cost time in the main function are calculations of trace and generation of random matrices. If these 2 only cost 2.533s and 1.001s then why does the whole script cost 61.062s which is a lot more than 2.533 + 1.001?
Note that built-in functions like svd, cellfun, rand, etc. don't show up in your Profiler report but they do take time. If you open the entry for testuiTujSortImprove I believe you'll see a line in the Children table where the time spent in built-in functions will be reported, and given how many times you called svd in particular I think that will account for the "missing" time.
I got it, opening the entry of testuiTujSortImprove shows:
It seems that the operations inside trace, i.e.
u1{3} * u1{2}' * u1{1}' * u2{1} * u2{2} * u2{3}'
cost the majority of time.

Accedi per commentare.

 Risposta accettata

You can make the trace operator work faster as follows: Currently, the input is two truncated SVDs, A1 = U1 * S1 * V1' and A2 = U2 * S2 * V2', and you are computing
trace(A1'*A2)
correct? After inserting A1 and A2, you can use the property of trace that trace(A*B) == trace(B*A) (note that trace(A*B*C) ~= trace(A*C*B), see wikipedia).
So this means that you can rearrange
trace(V1*S1'*U1'*U2*S2*V2') == trace( (V2'*V1) * S1 * (U1'*U2) * S2)
Make sure that the parentheses are set like this, and all other operations are on nRem-by-nRem matrices.
By the way, you can also rewrite trace(A'*B) as sum(sum(A.*B)), but I'm not sure if this will give you a speedup for this case.

3 Commenti

Hi Chris,
Thanks for the reply. I do have a further speedup now with your rearrangement:
I wonder what caused the speedup? My understanding is that calculating trace is very cheap, 250000 operations only cost 2.402s. What cost time is the operations inside the trace, i.e.
V1*S1'*U1'*U2*S2*V2'
So in order to improve speed, this needs to be optimised. Your suggestion results in nRem-by-nRem operations, is this why it's faster? I thought this would improve the speed of calculating trace, not what's inside trace?
Trace is a very quick operation, compared with the matrix multiplications, so the larger part of the speed-up is probably about those multiplications. The trace call itself should also become a bit faster, because it now acts on nRem-by-nRem matrices, instead of nTest-by-nTest matrices.
You can try taking the function calls apart:
M3 = u2{3}' * u1{3};
M1 = u1{1}' * u2{1};
M = M3 * u1{2} * M1 * u2{2};
trace(M);
This way, the profiler will tell you how much time is spent in each line, and you can see how much time the trace call takes.

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by