How to execute fft's with gpu,cuda in parallel (spmd)?

Greg on 12 Jun 2013
I have IxJxL single type data volume. I want to execute 1-D fft's of L length IxJ times.
data_device = gpuArray(data);
Nested 'for' loop on GPU works slower than on CPU.
for ii = 1:I
for jj = 1:J
data_device(ii,jj,:) = fft(data_device(ii,jj,:));
'Parfor' works even slower than simple 'for' loop.
1 Is there a way to use spmd method on GPU alike its done on CPU? How?
2 What is the optimal size of data to be send to GPU, considering params gpuDevice() function retuns?

Edric Ellis
Edric Ellis on 12 Jun 2013
MATLAB's FFT function can operate along any single dimension. So you can simply do:
data_device = fft(data_device, [], 3);
rather than having a loop. See the FFT reference page for more.

