Why does the GPU load increase over time?

5 visualizzazioni (ultimi 30 giorni)
Here's my code:
% for j=1:4
reset(gpuDevice(1)); clear all; % clean up
format long; % show double precision
R_i_gpu=gpuArray(6); % initial radius
dL_gpu=gpuArray(1.e-5); % delta length
n_gpu=R_i_gpu/dL_gpu; % calculate the number of steps or intervals
theta_i_gpu=dL_gpu/R_i_gpu; % calculate the initial theta
R_final_gpu=gpuArray(2); % final radius
dR_gpu=R_final_gpu-R_i_gpu; % calculate delta radius
d_theta_gpu=gpuArray(3*pi/2); % angle that the radius varies over (i.e. (R_final-R_initial)/d_theta))
dR_d_theta_gpu=dR_gpu/d_theta_gpu; % calculate the rate of change of radius with respect to theta
% Ri=R_i-(dR_d_theta*theta_i)
% thetai=dL/Ri
Ri_gpu=R_i_gpu; % initialise radius
thetai_gpu=theta_i_gpu; % initialise theta
for i=1:n-1; % for loop
Ri_gpu=Ri_gpu+(dR_d_theta_gpu*thetai_gpu); % update radius
thetai_gpu=dL_gpu/Ri_gpu; % update theta
R_gpu(i)=Ri_gpu; % put the radius into a column array
theta_gpu(i)=thetai_gpu; % put the theta into a column array
% A_gpu=[R_gpu ;theta_gpu]'; % create the radius/theta array
% R_gpu=[R_i_gpu R_gpu]'; % horizontally concatenate the initial radius with the radius array that's calculated for each of the interval steps
theta_gpu=[theta_i_gpu theta_gpu]'; % hcat the initial theta with the theta array
theta_sum_gpu=sum(theta_gpu); % sum the theta (in radians)
theta_sum_deg_gpu=theta_sum_gpu*360/(2*pi); % convert the theta sum to degrees
% ptime=toc
% end
I'm running this on a 3930K with a nVidia GTX 660 Superclocked and I noticed that as my dL_gpu goes from 1.e-4 to 1.e-5, the effective computational rate decreased. So, I started using GPU-Z to monitor the memory usage and the GPU load and the GPU memory controller load and found that both the GPU load and GPU memory controller load increases as time goes on and now I am trying to figure out why it is doing that?
Should it be that if you're solving a 1-D integration of a Newton's approximate-like solution that for each step/iteration, the time required is the same?
I'm trying to understand how MATLAB builds arrays for A(i)=B. Does it rebuild the entire array at each iteration or does it just add the latest entry to the bottom of the list?
And if that is the case, then why is the memory controller load going up (also as a function of time)?
Any assistance that can try and help me understand what's going on behind the scenes would be greatly appreciated! Thank you!
  1 Commento
Matt J
Matt J il 16 Giu 2014
Ewen Chan commented:
So here's the GPU load graph:
And here's the GPU memory load graph:
And the ONLY thing that I'm changing between the different runs is the dL_gpu parameter. Everything else about the code remains the same.
So, I am trying to learn more about what's going on and what's causing this behaviour.
Any help will be greatly appreciated. Thank you.

Accedi per commentare.

Risposta accettata

Joss Knight
Joss Knight il 16 Giu 2014
Modificato: Joss Knight il 16 Giu 2014
A(i) = B without pre-allocation will add data to the end of your array until it runs out of space, then it will allocate more space, copy the existing array across, and continue. This takes a lot of time. This is where all your little spikes come from. The larger the array, the longer the resize operation takes: when you have smaller deltas your array is getting longer and longer and so everything is running slower and slower.
Note that what you are doing here is not appropriate for GPU computation. The GPU is useful for operating in parallel on large arrays of data. You are not doing anything in parallel here, so the GPU is mostly idle and you've wasted a lot of time sending data over to it.
Another tip: don't put scalar data onto the GPU (e.g. R_i_gpu=gpuArray(6)). Only put arrays on the GPU. GPU code will automatically bring scalars across to the GPU if and when it is necessary.
  1 Commento
Ewen Chan
Ewen Chan il 16 Giu 2014
This was a study for an open integral, and I was interested in comparing between CPU and GPU since there's a huge fascination with GPU-assisted computing, and this was to solve an actual practical question/application which helps to demonstrate that like so many things, there's a proper time and place for everything.
I was also looking to see if there was some way to parallelize this type of computation in order to get to the solution quicker.
re: scalar Thank you for the tip!

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by