Matrix product and inversion speed up
14 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hi,
my question is simple. I have a function where the most computational intesinve line is this simple matrix operation
iF = iR - iRZ*(iP - Z' *iRZ )\iRZ';
Dimesions of the involved objects vary at each call, but tipically are of the orders of magnitude listed here:
iR -> diagonal 40kx40k (saved as sparse)
iRZ -> 40k x 100 (saved as sparse)
Z -> 40k x 100
iP -> 100x100
Is there something one can do to speed up the code?
Thanks
8 Commenti
Christine Tobler
il 19 Giu 2020
What are you doing with the matrix iF in the next step? Since this is a dense 1e4x1e4 matrix, it will be comparatively expensive to compute, and just avoiding to store this explicitly altogether would probably be the fastest thing.
For example, if your next step is to do matrix-vector multiplication with iF, you could replace
y = iF*x
with
y = iR*y - iRZ*((iP - Z' *iRZ )\(iRZ'*y));
For some other possible operations, you will need to store the complete iF as a dense matrix - it depends on that next step.
Risposte (2)
John D'Errico
il 19 Giu 2020
Modificato: John D'Errico
il 19 Giu 2020
What you seem not to appreciate is that 90-95% is NOT very sparse, not when you are talking matrix factorizations (i.e., backslash.) Backslash does not compute an inverse, although the inverse of a general not terribly sparse matrix will also probably be almost full too.
Inside this expression:
iRZ*(iP - Z' *iRZ )\iRZ'
we see a 100x100 matrix in the parens. (Roughly, since you say the sizes changes somewhat arbitrarily.) That matrix is what controls a LOT, since it will likely be fairly non-sparse, given the components. But once that 100x100 matrix is decomposed using a matrix factorization, it likely becomes close to a full matrix, or a product of a full lower and upper triangular pair.
In that case, then if you multiply that result using a pre multiply by iRZ as well as an implicit multiply by iRZ', you also get something fairly full.
My question in the comments was to ask what is the sparsity of those submatrices. That is, compute
A = iP - Z' *iRZ ;
nnz(A)/numel(A)
You might also then try for that same A:
[L,U] = lu(A);
nnz(L)/numel(L)
nnz(U)/numel(U)
If each of those matrices is nearly 50% non-zero, then they are essentially full triangular matrices.
Next, on the parent computation:
A = iRZ*(iP - Z' *iRZ )\iRZ';
nnz(A)/numel(A)
Those are the things you need to do. You should not be surprised to see a great deal of fill-in. In fact, consider yourself lucky if that does not happen.
You might consider sparsity enhancing permutations, applied to the inner 100x100 matrix, thus tools like colamd, dmperm, etc. (Sorry, but it has been many years since I was actively using those tools. They can help a great deal when used properly.)
Remember that once things start to become too filled in, a sparse matrix computation can start to become LESS efficient than a full one would have been. You might test if making that inner square matrix a full matrix helps too.
Are these things important? Yes.
For example:
A = rand(100);
As = sparse(A);
B = sprand(10000,100,0.01);
So B is 99% sparse. A is essentially a full matrix. And remember that even if A is only fairly full, a factorization of A will probably be quite close to full anyway.
Consider the term
C = A\B';
nnz(C)/numel(C)
ans =
0.6299
It does not matter whether or not A is sparse, backslash created a matrix that was 63% non-zero. It is effectively full in the world of sparse matrices. In fact, if we look at what happens with
Cs = As\B';
>> whos C Cs
Name Size Bytes Class Attributes
C 100x10000 8000000 double
Cs 100x10000 10158408 double sparse
So that 63% non-zero matrix used roughly 25% MORE space to store than the comparable full version.
Now compare times:
timeit(@() B*(A\B'))
ans =
0.3445460668125
timeit(@() B*(As\B'))
ans =
1.5696587328125
Finally, consider this:
Bf = full(B);
timeit(@() Bf*(A\Bf'))
ans =
0.1513420138125
And those times were only for a 10000x100 problem, and a sparse version of B that was 99% sparse, not just 90-95%.
As long as I could store that final result in memory as a full matrix, I actually gained by just keeping the computation completely full. To be useful, sparse matrices need to be seriously sparse. As well, sparsity can disappear quickly.
So if that inner part is full, or even close to being full, then sparsity of B does not help. In fact using sparse matrices may be LESS efficient, using more storage in those products, and using more time. And you need to look at the sparsity of the computed results. At least consider fill-in reducing permutations, as it is fill-in that can be a killer for sparse matrices.
Bjorn Gustavsson
il 19 Giu 2020
My comment still seems relevant. A simple commandline test gives:
tic,AA = ones(3e4,2)*(ones(2,3e4)*ones(3e4,1));toc
% Elapsed time is 0.001027 seconds.
tic,BB = (ones(3e4,2)*ones(2,3e4))*ones(3e4,1);toc
% Elapsed time is 2.031760 seconds.
isequal(AA,BB)
% logical 1
If this is run inside a function where the JIT-optimization can work its wonders the multiplication might be a bit more cunning. But when you know the sizes ofthe matrices that should be multiplied together you can guide the order of the multiplications so that you avoid outer-products producing large intermediate matrices that then vanishes to produce a smaller matrix or vector in the end.
HTH
2 Commenti
Bjorn Gustavsson
il 19 Giu 2020
That's a downer, I hoped that inverting a 100x100 matrix wouldn't be that expensive. Perhaps you can have some use of Factorize.
Vedere anche
Categorie
Scopri di più su Performance and Memory in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!