I need to insert multiple 32x32 arrays into 1024x1024 array at predetermined random locations. How to do this efficiently?

3 visualizzazioni (ultimi 30 giorni)
I need to insert many (10^6) 32x32 arrays into single 1024x1024 array and sum them. Insertion location is random, but predetermined. Simple for loops result in long execution time. How can I optimize? I do have parallel processing and gpu available.
Nx=ceil(rand(1e6,1)*(1024-32)); %avoiding partial arrays at edges
Ny=ceil(rand(1e6,1)*(1024-32));
smallArray=ones(32);
largeArray=zeros(1024);
for ii=1:1e6
largeArray(Nx(ii):Nx(ii)+31,Ny(ii):Ny(ii)+31)=largeArray(Nx(ii):Nx(ii)+31,Ny(ii):Ny(ii)+31)+smallArray;
end

Risposta accettata

Joss Knight
Joss Knight il 8 Giu 2016
Use accumarray (and randi):
N = 1024;
Nx = randi([1 (N-32)], 1e5, 1);
Ny = randi([1 (N-32)], 1e5, 1);
linIndStart = sub2ind([N N], Nx, Ny);
smallArray = gpuArray.ones(32);
[I, J] = ndgrid(1:32);
linOffsets = sub2ind([N N], I(:), J(:)) - 1;
linInd = bsxfun(@plus, linIndStart, linOffsets');
values = repmat(smallArray(:)', numel(Nx), 1); % Replace with actual vals
largeArray = accumarray(linInd(:), values(:), [N*N 1]);
largeArray = reshape(largeArray, N, N);
If your submatrices are genuinely all just ones, then just replace the values(:) with 1.
  2 Commenti
Kevin Hughes
Kevin Hughes il 8 Giu 2016
Modificato: Kevin Hughes il 8 Giu 2016
Joss, Thanks. Actually got slightly better (8.3 vs 9.7) results with:
Nx=1024;
Np=100000;
Ns=32;
smallArray=gpuArray.ones(Ns)+1i*gpuArray.ones(Ns);
smallArrayr=reshape(smallArray,1,Ns^2);
smallArrayrr=gpuArray.ones(Np,1)*smallArrayr;
% [x(:) y(:)] will be a list of the Ns x Ns offset coordinates for
% each box element
[x, y] = meshgrid(1:Ns);
x=gpuArray(x);
y=gpuArray(y);
%random insertion locations
insertX=gpuArray(round(ceil(rand(Np,1)*(Nx-Ns))));
insertY=gpuArray(round(ceil(rand(Np,1)*(Nx-Ns))));
nx=gpuArray(insertX);
ny=gpuArray(insertY);
% Add every offset to every coordinate. The result is 256x16, that
% is one row per box, with 16 elements in each box. Now x and y
% identify every element of every box.
xx = bsxfun(@plus, nx(:), x(:)');
yy = bsxfun(@plus, ny(:), y(:)');
% Now we have arrays that identify what to add together and where
% to put the results. accumarray can do the accumulation based
% on the list of coordinates [x(:) y(:)] and the data going into
% each element bmatrixx(:) and bmatrixy(:).
largeArray = accumarray([xx(:) yy(:)], smallArrayrr(:), [Nx Nx], @sum);
Code found on this site at related link. (Actually your answer to someone else with similar issue) http://www.mathworks.com/matlabcentral/answers/264457-why-for-gpu-is-slower-than-cpu-for-this-code-is-it-because-of-sparsity-or-because-of-for-loop Sets up smallArrays a little differently, but uses accumarray.
Joss Knight
Joss Knight il 9 Giu 2016
Oh well done - I knew I'd answered basically the same question before but I couldn't unearth the old answer!

Accedi per commentare.

Più risposte (1)

Ahmet Cecen
Ahmet Cecen il 29 Mag 2016
If your small array will always be the same, you might be able to do this lightning fast with a convolution. Instead of using the random ok indices as corners, you would use them as centers. Do something like:
LargeArray(randindexvector)=1;
You can separately find repetitions using unique and add them if you happen to get the same center twice. Much smaller for loop.
Then you convolve with a 16x16 ones matrix.
There might be some rough edges in this approach that needs some smoothing, but if you can get it to work, it would be milliseconds.
  2 Commenti
Kevin Hughes
Kevin Hughes il 29 Mag 2016
Thanks. Small array is actually complex. As written I get about 20 seconds execution time. But 10^6 could be order of magnitude low and I'll need to repeat with variations on insertion locations a few thousand times which puts total at 160 hours...Not sure assumption of constant small array holds or convolution would be excellent way to go.
The first related question links to a vectorized answer that also shows promise.
Ahmet Cecen
Ahmet Cecen il 29 Mag 2016
You can cascade on stages of uniqueness and vectorize the fill. Say you created 1000 random values, and say 100 of them are duplicates. You first work on the 900, then 100, then any duplicates within the 100 etc. You do something like:
for i = 1:BiggestNumberofRepetations
WORK HERE
end
Hopefully this is an orders of magnitude smaller iteration. This allows you to iterate over the dimensions of the smaller matrix instead of the number of random numbers. I will burst the memory here but you can work around it. It should look something like:
RandInd = sub2ind([1024 1024],Nx,Ny);
for i = 1:32*32
largeArray(RandInd+floor((i-1)/32)*1024)+mod(i,32)) = largeArray(RandInd+floor((i-1)/32)*1024)+mod(i,32)) + smallArray(i);
end
Now there is some nasty indexing there so double check that. Now your iteration count is MaximumRepetationsCount*numel(smallArray), which in most cases should be still orders of magnitude smaller.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by