1) Why for dense matrices it's not useful to split B? 2) What are the factors that determine the speed up for the sparse case?

Question

Giovanni De Luca il 26 Mar 2013

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/68654-1-why-for-dense-matrices-it-s-not-useful-to-split-b-2-what-are-the-factors-that-determine-the-spe

I want to solve the linear system Ax=B for x, computing x=A\B, with B having m column-vector. This is one of the operations in my function, but it is the most expensive. In order to obtain a speed up, I thought about splitting the matrix B into m columns (let's call them B_i, with i=1,2,..,m), and computing in parallel the x_i column-vectors (calling matlabpool), with the code below:

x=[];
parfor i=1:m
  B_i=B(:,i);
  x_i=A\B_i;
  x=[x,x_i];
end

Let assume that for my purpose is not important the order in which the x_i are concatenated per column. I compared the time simulation obtained from x=A\B (with the whole B) and those obtained with the parallel code, for 3 different data sets, all with A non-singular and unsymmetric, and B sparse and having full rank: the first one has A dense with dimension n=2600 and number of columns of B m=30; the second one has A sparse, but with "enough dense" subblocks, with dimension n=10000 and m=9; the third one has A "very" sparse (like it is "diagonal" with 2 other diagonals starting from position (1,1) and ending in (n,n/4) and (n,3n/4), respectively), with n=25000 and m=300.Well, for the first 2 cases I cannot obtain a speed up in time simulation, indeed, the time of the parallel code is greater than x=A\B; for the last data set I obtained a time simulation 5-times lower than x=A\B, with a pool size of 2 and 4 GB of RAM, and 8-times lower with a pool size of 8 and 64 GB of RAM. So, I understand that the backslash operator works depending on the structure (and the size) of the matrix A (in the cases with A sparse, matlab tells me it used Unsymmetric MultiFrontal PACKage with automatic reordering, then UMFPACK's factorization and solver). My question are: 1) Why for dense matrices it's not useful to split B? 2) What are the factors that have determined the speed up for the last sparse case (which has no dense subblock)?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Jill Reese il 26 Mar 2013

2
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/68654-1-why-for-dense-matrices-it-s-not-useful-to-split-b-2-what-are-the-factors-that-determine-the-spe#answer_79957

I'll have to leave your second question for someone else to answer, as I am not sure of the details. As for your first question, you are not seeing speedup because every call to mldivide (\) requires a factorization of the input matrix A as part of the linear system solve.

If you have multiple right hand sides, B, it is best to solve all of them at once because then you are amortizing the cost of the (expensive) factorization over many right hand sides.

So it is not surprising that the serial case (1 expensive factorization A combined with m cheaper solves) is faster than the parfor case (m factorizations and m solves).

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Answer 2

Giovanni De Luca il 26 Mar 2013

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/68654-1-why-for-dense-matrices-it-s-not-useful-to-split-b-2-what-are-the-factors-that-determine-the-spe#answer_79984

Apri in MATLAB Online

Thank you. The answer has been useful to understand the expensive cost of the factorization. Then, I've corrected my code:

x=[];
[L,U]=lu(A);
parfor i=1:m
  B_i=B(:,i);
  x_i=L\B_i;
  x_i=U\x_i;
  x=[x,x_i];
end

in order to compute the factorization of A just once, but the serial code [L,U]=lu(A); x=L\B; x=U\x; (with the whole B) is still better then the parallelized one. Maybe the problem, for the dense case, depends on the fact that Matlab uses the LAPACK and, more in detail, the BLAS libraries. When B has multiple vectors the level 3 BLAS is used, i.e. for the matrix-matrix multiplication, which is already optimized for this operation, then there is no sense to split B per columns. Is it right? And, maybe, for the sparse case the possible parallelization depends on how the graph that represents the sparse matrix is structured...

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Jill Reese il 27 Mar 2013

Yes, you are correct. The ability to use level 3 BLAS when B has multiple columns also provides improved performance.

Accedi per commentare.

1) Why for dense matrices it's not useful to split B? 2) What are the factors that determine the speed up for the sparse case?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

1) Why for dense matrices it's not useful to split B? 2) What are the factors that determine the speed up for the sparse case?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti