need to vectorize efficiently calculating only certain values in the matrix multiplication A * B, using a logical array L the size of A * B
Mostra commenti meno recenti
I have matrices A (m by v) and B (v by n). I also have a logical matrix L (m by n).
I am interested in, as efficiently as possible, calculating only the values in A * B that correspond to logical values in L (values of 1s). Essentially I am interested in the quantity ( A * B ) .* L .
For my problem, a typical L matrix has less than 0.1% percent of its values as 1s; the vast majority of the values are 0s. Thus, it makes no sense for me to directly perform ( A * B ) .* L , it would actually be faster to loop over each element of A * B that I want to compute, but even that is inefficient.
-------------------------------------------------------------------------------------------------------------------------------------------------------
Possible solution (need help vectorizing this code if possible)
My particular problem may have a nice solution given that the logical matrix L has a nice structure.
This L matrix is nice in that it can be represented as something like a permuted block matrix. This example L in is composed of 9 "blocks" of 1s, where each block of 1s has its own set of row and column indices. For instance, the highlighted area here can be seen the values of 1 as a particular submatrix in L.
My solution was to do this. I can get the row indices and column indices per each block's submatrix in L, organized in two cell lists "rowidxs_list" and "colidxs_list", both with the number of cells equal to the number of blocks. For instance in the block example I gave, subblock 1, I could calculate those particular values in A * B by simply doing A( rowidxs_list{1} , : ) * B( : , colidxs_list{1} ) .
That means that if I precomputed rowidxs_list and colidxs_list (ignore the costs of calculating these lists, they are negligable for my application), then my problem of calculating C = ( A * B ) .* L could effectively be done by:
C = sparse( m,n )
for i = 1:length( rowidxs_list )
C( rowidxs_list{i} , colidxs_list{i} ) = A( rowidxs_list{i} , : ) * B( : , colidxs_list{i} ) .
end
This seems like it would be the most efficient way to solve this problem if I knew how to vectorize this for loop. Does anyone see a way to vectorize this?
There may be ways to vectorize if certain things hold, e.g. only if rowidxs_list and colidxs_list are matrix arrays instead of cell lists of lists (where each column in an array is an index list, thus replacing use of rowidxs_list{i} with rowidxs_list(i,:) ). I'd prefer to use cell lists here if possible since different lists can have different numbers of elements.
-------------------------------------------------------------------------------------------------------------------------------------------------------
other suggested solution (creating a mex file?)
I first posted this question on the /r/matlab subreddit, see here for the reddit thread. The user "qtac" recommended that a C-MEX function linking to C programming language:
My gut feeling is the only way to really optimize this is with a C-MEX solution; otherwise, you are going to get obliterated by overhead from subsref in these loops. With C you could loop over L until you find a nonzero element, and then do only the row-column dot product needed to populate that specific element. You will miss out on a lot of the BLAS optimizations but the computational savings may make up for it.
Honestly I bet an LLM could write 90%+ of that MEX function for you; it's a well-formulated problem.
I think this could be a good solution to pursue, but I'd like other opinons as well.
8 Commenti
James Tursa
il 17 Feb 2025
We would need to know typical sizes involved to offer any meaningful advice. Is there any pattern to the 1's in the L matrix?
Catalytic
il 18 Feb 2025
Thus, it makes no sense for me to directly perform ( A * B ) .* L , it would actually be faster to loop over each element of A * B
Have you actually tested to see if it is faster? Matlab extensively parallelizes operations like (A*B).*L, so it is not always clear if reducing the number of operations will translate into improved speed.
Cal
il 18 Feb 2025
For all cases I tried, I always found that ( A * B ) .* L is unfortunately very slightly slower than ( A * B ).
That's not what I asked. I asked if you had compared (A*B).* L to the mcoded loop that you claimed would be faster. Below is an example where the loop does poorer than the direct operation -
m=5000;n=m; v=3000;
A=rand(m,v); B=rand(v,m); L=sprand(m,n,0.1/100)>0;
timeit(@()direct(A,B,L))
timeit(@()looping(A,B,L))
function direct(A,B,L)
C=(A*B).*L;
end
function looping(A,B,L)
[I,J]=find(L);
K=numel(I);
C=double(L);
for k=1:K
C(I(k),J(k))=A(I(k),:)*B(:,J(k));
end
end
Cal
il 20 Feb 2025
Cal
il 20 Feb 2025
Risposta accettata
Più risposte (1)
Is v much smaller than m or n? If so, one approach might be with an outer product decomposition:
L=sparse(L); %if L was not previously in sparse form
C=0;
for i=1:v
C = C + A(:,i).*L.*B(i,:);
end
1 Commento
Cal
il 18 Feb 2025
Categorie
Scopri di più su Matrix Indexing in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!