Slow performance of fftn in the gpu when used inside a loop

2 views (last 30 days)
I have just realized that the execution times for fftn operations inside a loop is not proportional to the length of the loop when working inside the GPU.
As an example, if I define a cube in the GPU
a = gpuArray(ones(256,256,256,'single'));
I see that the user time does not scale with the number of visits in a loop. For moderate loops I read:
>> N=100;tic;for i=1:N;g=fftn(a);end;toc
Elapsed time is 0.008618 seconds.
... but for a loop which is 10 times bigger
>> N=1000;tic;for i=1:N;g=fftn(a);end;toc
Elapsed time is 7.299844 seconds.
the total time does not scale by 10 but by 1000!!!! I know tic/toc is not the best way to measure performance, but it is still the time seen by the users of the program... is there some basical principle of handling gpuArrays inside loops that I am missing?
  2 Comments
Arabarra
Arabarra on 15 Oct 2019
The full code are pretty much the three lines above... not certain what you mean

Sign in to comment.

Accepted Answer

Edric Ellis
Edric Ellis on 10 Oct 2019
Various methods on the GPU operate to some extent asynchronously. But there are limits to this - depending on the amount of memory available etc. The best way to time GPU operations is to use gputimeit, like this:
a = gpuArray(ones(256,256,256,'single'));
% Basic case, no looping
t1 = gputimeit(@() fftn(a));
% Looping cases
t100 = gputimeit(@() iLoop(a,100));
t1000 = gputimeit(@() iLoop(a,1000));
% Compare results
disp([t1, t100/100, t1000/1000])
function iLoop(a,N)
for i = 1:N
fftn(a);
end
end
On my machine, I see that the results are consistent - i.e. gputimeit does a good job of getting an accurate time even for a single call to fftn. Running the above script, the result I get is:
>> repro
0.0081 0.0081 0.0081
  3 Comments
Arabarra
Arabarra on 29 Apr 2020
thanks for the answer! The wait command on device gave me the key to discover where my algorithm was creating a hidden bottleneck.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by