13 views (last 30 days)

Show older comments

I'm trying to optimize a very specific vector operation, namely taking a large (2^20 x 1) vector, reshaping it, permuting the indices, and reshaping once more. To be concrete, an example:

A = rand(2^20,1); % Large vector, with one dimension a power of 2

A = A./norm(A); % Normalize just for convenience

DR = 2^6; % DR,DL,DM are powers of 2 which multiply to form the size of A

DL = 2^6;

DM = 2^8;

tic;

B = reshape(permute(reshape(A,DR,DM,DL),[2,1,3]),DM,DR*DL);

toc; % On my machine this takes ~1.2 ms

The above operation is very simple, and entirely limited in speed by the permute step - as I understand it, permutation in matlab requires the entire array to be copied, losing time for the copy to be created and the transfer to occur. I am wondering if there is any clever way to get past this requirement for this specific use-case.

I have tried putting the operation of a gpu (by calling, for instance),

A = rand(2^20,1,'gpuArray')

Which does improve the runtime by a factor of ~4 but also hurts some other areas of my application. I have not yet tried to mexify the code, but would be interested if this seems a viable way to improve as well.

Edit from the comments: Ultimately this reshaped vector/matrix "B" is then multiplied by a Matrix (DM x DM), and then permuted/reshaped back into it's original form. If there is some fast way to combine all of those operations then that would of course be even more ideal.

Edit 2 for further context: As the answers/comments asked for more clarification of the overall use case, I will provide a toy model of a larger chunk of the code. Essentially this is the type of overall operation we are looking to do:

L = 20;

mid_size = 4;

DM = 2^mid_size;

A = rand(2^L,1);

A = A./norm(A);

Ms = rand(DM,DM,L-mid_size+1);

tic;

for left_size = 0:mid_size:(L-mid_size)

right_size = L - mid_size - left_size;

DR = 2^right_size;

DL = 2^left_size;

B = reshape(permute(reshape(A,DR,DM,DL),[2,1,3]),DM,DR*DL);

B_prime = Ms(:,:,left_size+1) * B;

A = permute(reshape(B_prime,DM,DR,DL),[2,1,3]);

end

A = reshape(A, 2^L, 1);

toc;

This is of course embedded in a larger program, but I think this is essentially an isolated "kernel"

Matt J
on 17 Jun 2021

Edited: Matt J
on 17 Jun 2021

Edit 2 for further context: ...Essentially this is the type of overall operation we are looking to do:

This will be more efficient:

L = 20;

mid_size = 4;

DM = 2^mid_size;

A = rand(2^L,1);

A = A./norm(A);

Ms = rand(DM,DM,L-mid_size+1);

Ms=permute(Ms,[2,1,3]); %<--- pre-permute outside the loop

tic;

for left_size = 0:mid_size:(L-mid_size)

right_size = L - mid_size - left_size;

DR = 2^right_size;

DL = 2^left_size;

A= pagemtimes( reshape(A,DR,DM,DL) , Ms(:,:,left_size+1));

end

A = reshape(A, 2^L, 1);

toc;

James Tursa
on 15 Jun 2021

Edited: James Tursa
on 15 Jun 2021

Don't do the permute( ) operation. Just use pagemtimes( ) downstream in your code with the appropriate 'transpose' option. This will cause the matrix multiply to use code that "virtually" transposes the matrix without actually physically forming it first.

https://www.mathworks.com/help/matlab/ref/pagemtimes.html?searchHighlight=pagemtimes&s_tid=srchtitle

E.g., something like this if I understand your dimensions:

result = reshape(pagemtimes(Matrix,'none',reshape(A,DR,DM,DL),'transpose'),DM,DR*DL);

I think pagemtimes( ) is multi-threaded and uses BLAS in the background so I doubt a mex routine could be written to beat this for speed.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!