GPU memory overhead dependent on fft dimension.

Question

0 voti

Hello all, I have a question regarding memory management during Matlab's gpuArray/fft operation. I have a large NxM matrix [N = 10E3,M = 20E3, as an approx] where where I wish to take an fft in the M dimension. Now, for CPU operations I would normally permute the matrix to make the fft operation act in the 1st (column) dimension, for speed.

On the GPU, if I run the fft operation in the 1st dimension, I slam into the memory ceiling of my GPU. However, if I apply it in the row dimension I do not. I assume that this has to do with whether Matlab is doing N asynchronous fft's in the row direction, vs. a single massive matrix operation in the column dimension.

So, 4 questions:

Is my assumption true?
Are GPU operations still faster in the column direction (sort of answered this myself, got 3x speed advantage with below snippet.)
Is there a way to know what the GPU memory need will be for the fft? If so, I can try chunking up the fft based on the GPU memory available.
Is there another implementation that will have the speed of the column operation without the memory issues? I am going to try doing this as an arrayfun just to see.

Code snippet:

 x = gpuArray.rand(10000,10000);
xp = x.';
gputimeit(@() fft(x,[],1))
gputimeit(@() fft(xp,[],2))

Thanks all.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

D. Plotnick il 2 Lug 2018

Apri in MATLAB Online

As I suspected, arrayfun (at least my way of using it) is way slower.

 f = @(i) fft(x(:,i),[],1);
tic
y = arrayfun(f,1:size(x,2),'UniformOutput',false);
wait(g);
y = cat(2,y{:});
toc

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Joss Knight il 2 Lug 2018

Apri in MATLAB Online

0 voti

MATLAB uses cufft, so the behaviour is whatever its behaviour is. The implication of the batching API as described by the doc - https://docs.nvidia.com/cuda/cufft/index.html - is that batches that are contiguous result in multiple kernel launches. This will be slower, but more efficient with memory.

Because the amount of memory an FFT needs is so variable and dependent on signal length, it isn't that valuable to know what the size will be for any particular example. If you're curious you can watch the FreeMemory property output from gpuDevice:

gpu = gpuDevice
gpu.FreeMemory

After an FFT the FFT plan is retained so you should see how much memory it took up (as long as it's the first FFT you do in the MATLAB session). For working memory you can assume there will be a copy of the input, possibly two because MATLAB itself will often take a copy of the input in order to ensure your data is not corrupted in the event of an error.

If you can get your signals to be a power of 2 in length (say, 8192) you'll find them much more efficient with memory.

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

GPU memory overhead dependent on fft dimension.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Più risposte (0)

Categorie

Tag

Community Treasure Hunt

GPU memory overhead dependent on fft dimension.

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Più risposte (0)

Categorie

Tag

Vedere anche

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti