significant increase of memory when moving part of the code to GPU

1 visualizzazione (ultimi 30 giorni)

Hi all,

I am experimenting a bit with matlab (R2014b) and gpu (Tesla 2075). I am puzzled by a significant increase in memory usage after I "moved" the innermost loop of my code to the gpu. I am by no means expert, and I'm possibly doing something wrong.

So my code is basically a wrapper for a function that integrates a set of coupled differential equations. The innermost loop iterates a Runge-Kutta integration a few hundred times. A fair amount of ffts and iffts are involved, so I thought that moving that to the GPU would speed up my code. I turned all the auxiliary vectors in the four RK steps into gpuArrays. When the innermost loop has finished, I gather only the gpuArray containing the state of my system, and leave all the auxiliary stuff in the GPU. Ready for the next loop, I guess. Turns out that the speed actually increases, for sufficiently large systems. However, apparently this comes at the price of a significant increase in memory.

The machine I'm using is on a cluster managed by HTCondor. I have noticed that the "GPU version" of my code way more memory than the "CPU version". The situation according to condor_q and top is the following

    SIZE(condor_q)	VIRT(top) 	RES(top)	SHR(top)
 GPU 73242.2		67,775g 	468112		129700
 CPU 3418.0		3277324 	186324  	 77264

The readings from top should be in KiB, those from condor_q in Kbytes.

Update: in order to check whether this behavior was caused by the queuing system (HTCondor), I submitted one instance of my code directly on the node of our cluster that has the GPU, using nohup. The job is now running in background, but the figures from "top" are basically the same as above for GPU.

Is such a memory increase to be expected? Am I missing something?

Thanks a lot for your help

Francesco

  3 Commenti
Mohammad Abouali
Mohammad Abouali il 12 Apr 2015
Modificato: Mohammad Abouali il 12 Apr 2015
I have also saw strange behavior regarding GPU and memory in MATLAB check here
No convincing answer yet.
Despite loving MATLAB a lot, I am almost giving up GPU on MATLAB. I don't get enough speedup, it seems it eats through memory and I am forced to run much smaller program.
If I write my kernels in CUDA-C and call it within matlab, I get better results though. The functions that they have and accepts GPU also is not bad.
pfb
pfb il 13 Apr 2015
Mohammad,
thanks for your thoughts. I took a look at the post you mention. If I get it right, the problem affects a GPU that is used for both display and computing. In my case it's a separate GPU, dedicated to computing. I did not do the check you did, but I want to try as soon as I have the chance.
I know nothing of CUDA-C. I wish I had time to learn that. Right now I was experimenting with matlab, and actually changing a few lines of codes gave me a significant speedup. That alone would be terrific, if it wasn't for the lurking memory "problem".
In your other post you mention device resetting. I have to say I am not doing that. When should it be done? Could you point me to the documentation you mention in your post?
Thanks a lot
Francesco

Accedi per commentare.

Risposta accettata

Edric Ellis
Edric Ellis il 13 Apr 2015
When you move the code to the GPU, MATLAB loads a suite of supporting CUDA libraries to provide implementations of fft etc. I believe this is the primary cause of the host-side memory increase you're seeing. The CUDA libraries supporting gpuArray are large because they contain specialised variants of many different algorithms, and support many different GPU hardware variants. On my system, I see the large increase in VSZ simply by invoking gpuDevice:
>> !ps -C MATLAB -O vsz,rsz
PID VSZ RSZ S TTY TIME COMMAND
4965 1865924 581852 S pts/4 00:00:31 /local/MATLAB/R2015a/bin/glnxa64/MATLAB
>> gpuDevice;
>> !ps -C MATLAB -O vsz,rsz
PID VSZ RSZ S TTY TIME COMMAND
4965 44297124 764552 S pts/4 00:00:32 /local/MATLAB/R2015a/bin/glnxa64/MATLAB
>> fft2(gpuArray.rand(2048));
>> !ps -C MATLAB -O vsz,rsz
PID VSZ RSZ S TTY TIME COMMAND
4965 44362660 830340 S pts/4 00:00:33 /local/MATLAB/R2015a/bin/glnxa64/MATLAB
  4 Commenti
Edric Ellis
Edric Ellis il 14 Apr 2015
Yes, I think the increase in VSZ should hopefully not actually cause you too many problems in practice. The shared libraries mostly consume host-side memory, although some GPU memory is needed to load the specific device code.
To answer your subsequent questions:
  1. You do not need to add a call to gpuDevice to your code - I used that simply to force MATLAB to load the GPU libraries
  2. Unfortunately you cannot selectively load the GPU libraries - they all get loaded as soon as you use any GPU functionality.
pfb
pfb il 14 Apr 2015
Thanks Edric, this was really useful.
I tried my code on a larger lattice (L=256) and the memory requirements remained more or less the same, especially as reported by HTCondor. This should confirm that -- for what I'm doing now -- most of the memory is indeed used for the libraries.
Also, it seems to me that the code is running much faster.
This makes me happy! :)

Accedi per commentare.

Più risposte (0)

Tag

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by