Why these two similar operations (a sparse matrix w/o transpose times a vector) take different time to finish?

Question

Zhuoran Han il 6 Mag 2022

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1712785-why-these-two-similar-operations-a-sparse-matrix-w-o-transpose-times-a-vector-take-different-time

Commentato: Zhuoran Han il 25 Ott 2022

n = 1e5;
a = sprand(n,n,5/n);
v = rand(n,1);
tic; for i = 1:1e3, b = a*v; end; toc
tic; for i = 1:1e3, c = a'*v; end; toc

I wanted to test if sparse matrix transpose takes a lot of time, so I used a sparse matrix times a full vector for an example (this is common in engineering problems). The only difference between the two loops is the matrix transpose.

I suprises me that the first loop (a*v) takes 1.28s whereas the second loop (a'*v) takes ONLY 0.18s. Switching the order of the two loops does not change the time cost.

What makes the higu difference? Why is matrix transpose faster than non-transpose?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Riccardo Scorretti il 6 Mag 2022

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1712785-why-these-two-similar-operations-a-sparse-matrix-w-o-transpose-times-a-vector-take-different-time#answer_958225

Interesting. I tried to see what happens with matrix of different sizes:

n = round(logspace(1, 6, 30));

t = [];

figure

for k = 1 : numel(n)

a = sprand(n(k),n(k),5/n(k));

v = rand(n(k),1);

tic; for i = 1:1e3, b = a*v; end; t(k,1) = toc;

tic; for i = 1:1e3, c = a'*v; end; t(k,2) = toc;

loglog(n(1:k), t(:,1), 'o-', n(1:k), t(:,2), 's-') ; grid on ;

xlabel('n') ; ylabel('T_{CPU}') ;

legend('a*v', 'a''*v');

drawnow;

end

Clearly something happens close to n = 10000 (more precisely, on my PC between n = 10002 and n = 10003, guess why this value!?):

My very personal hypothesis is that until n = 10002 the two multiplications are executed in the same way, by using a trivial algorithm. After, the transposition and multiplication are combined in a single operation.

The interest of this approach is that, as in MATLAB both full and sparse matrix are stored column-wisely (= like in Fortran), multiplicating the transpose of A by a vector v boils up into executing a kind of column-column multiplication, which is more efficient because if reduces cache misses.

2 Commenti
Mostra NessunoNascondi Nessuno

Zhuoran Han il 10 Mag 2022

Thank you very much for the explaination!

It might be possible to save a lot of computation time with pre-saved matrix transposes!

Zhuoran Han il 25 Ott 2022

I re-visited this problem, tryed different density and found that

when density = 5/n, n_change = 10002;

when density = 4/n, n_change = 12503;

when density = 3/n, n_change = 16668;

So, clearly there is some mechanism corresponds to 50k elements, or 50k*8*3 = 1.2MB of data. This value does not change with different CPU.

A guess is that when using a*v, MATLAB reads a row of elements and then sends it to multiplication; as for a'*v, if the total value of elements is no more than 1.2MB, it's the same with previous, whereas if it succeeds 50k, MATLAB reads the first 1.2MB data and then conducts the multiplication?

Accedi per commentare.

Answer 2

Sandeep Shrivastava il 6 Mag 2022

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1712785-why-these-two-similar-operations-a-sparse-matrix-w-o-transpose-times-a-vector-take-different-time#answer_958310

Apri in MATLAB Online

This is an interesting problem to look into! When I calculated the transpose separately and performed only the multiplication in loop, it gave me the same elapsed time as the earlier case. This probably means that it's not the operand that is affecting the performance, but rather:

1. the size of the matrix/vector, and;

2. the operator

They are possibly triggering BLAS routines which handle the large data sets and certain types of operations more efficiently. I think this behavior is not release-dependent.

Script:

n = 1e5;
a = sprand(n,n,5/n);
aT = a';
v = rand(n,1);
tic; for i = 1:1e3, b = a*v; end; toc
tic; for i = 1:1e3, c = aT*v; end; toc
tic; for i = 1:1e3, c = a'*v; end; toc

Output:

Elapsed time is 1.790306 seconds.
Elapsed time is 1.824447 seconds.
Elapsed time is 0.354630 seconds.

Here are some other posts on this forum where BLAS routines may have had a role to play:

https://www.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8907

https://www.mathworks.com/matlabcentral/answers/1611705-multiplication-of-large-matrix-with-its-transpose#answer_855905

https://www.mathworks.com/matlabcentral/answers/699825-multiplication-of-large-matrices#answer_581705

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Riccardo Scorretti il 6 Mag 2022

@Sandeep Shrivastava Indeed, in the first link James Tursa writes:

D' * bsxfun(@times,W',D)

"The latter will likely result in D' not being explicitly calculated. Rather, a transpose flag will simply be passed to the dgemm routine in the BLAS call. And the W' in the latter will not result in a data copy, it will be just be a simple reshaped shared data copy of the W vector."

This seems to support the idea that the difference in performances can be explained by the better usage of the cache memory. By explicitly computing and storing aT = a' MATLAB is forced to transpose the matrix physically, and the interpreter (or the JIT compiler) cannot be smart enough to realize that aT is the transposed of a. This would explains also your results.

Accedi per commentare.

Why these two similar operations (a sparse matrix w/o transpose times a vector) take different time to finish?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti
Mostra NessunoNascondi Nessuno

Più risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Why these two similar operations (a sparse matrix w/o transpose times a vector) take different time to finish?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti Mostra NessunoNascondi Nessuno

Più risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti