I have IxJxL single type data volume. I want to execute 1-D fft's of L length IxJ times.
data_device = gpuArray(data);
Nested 'for' loop on GPU works slower than on CPU.
for ii = 1:I
for jj = 1:J
data_device(ii,jj,:) = fft(data_device(ii,jj,:));
'Parfor' works even slower than simple 'for' loop.
1 Is there a way to use spmd method on GPU alike its done on CPU? How?
2 What is the optimal size of data to be send to GPU, considering params gpuDevice() function retuns?