Azzera filtri
Azzera filtri

Calculating the mean and standard deviation of a struct array (contains 7000 1*2000 array). GPU Arrayfun computation is much slower than CPU.

4 visualizzazioni (ultimi 30 giorni)
I created an array (size 7000 * 2000), and hope to calculate the mean and standard deviation of each row of the array (not the value for the whole array). Hence, my desired output is 2 arrays which contains the mean and SD of each row (but not a single value).
I first transorm it into a GPU array, and then turn the data array into a struct array. Each array inside the Onestream is a 1*2000 array.
function OneStream = DisassembleArray(data)
tic
[numRows, numCols] = size(data);
for count = 1 : numRows
OneStream(count).f1 = data(count, :);
end
fprintf("Disassemble Array timing: %.5f seconds \n" , toc)
The mean and are calculated in GPU by arrayfun below.
clear all
data = normrnd (20, 1, [7000, 2000]);
tic
data= gpuArray (data);
fprintf("Moving array into GPU timing: %.3f seconds \n" , toc)
OneStream = DisassembleArray (data);
tic
dataMetrics.mean = arrayfun(@(x) mean(x.f1),OneStream);
fprintf("GPU Arrayfun mean timing: %.5f seconds \n" , toc)
tic
dataMetrics.SD = arrayfun(@(x) std(x.f1),OneStream);
fprintf("GPU Arrayfun SD timing: %.5f seconds \n" , toc)
I use the tic toc functions to count the time needed for the computation
Moving array into GPU timing: 0.536 seconds
Disassemble Array timing: 0.22666 seconds
GPU Arrayfun SD timing: 1.75950 seconds
GPU Arrayfun SD timing: 3.54295 seconds
I have another version of CPU code, which gives the same output as well.
clear all
data = normrnd (20, 1, [7000, 2000]);
tic;
% array size
dataMetrics.mean = zeros(7000, 1);
dataMetrics.SD = zeros(7000,1);
for countEnsembleSize = 1 : 7000
dataMetrics.mean(countEnsembleSize)= mean(data(countEnsembleSize, :));
dataMetrics.SD(countEnsembleSize) = std(data(countEnsembleSize, :));
end
fprintf("\n****Finding mean and std****\nFor loop timing: %.5f seconds \n" , toc)
This is the time needed for the CPU computation
Disassemble Array timing: 0.05250 seconds
CPU mean timing: 0.01380 seconds
CPU SD timing: 0.03785 seconds
GPU is supposed to be much faster than CPU, but in my case, GPU is 100 times slower (0.05s vs >5s).
I think there are some errors in my code, but I have no clue where.
Would you mind helping me? I am very grateful for your help.

Risposta accettata

Chunru
Chunru il 21 Lug 2022
I did a test on my machine with an old GPU card (Quadro P5000) and 44 core cpu. Here is the result:
a = randn(7000, 2000)
mcpu = mean(a); % 0.003094 sec
agpu = gpuArray(a); % 0.027013 sec
mgpu = mean(agpu); % 0.000912 sec
mgpu1 = gather(mgpu); % 0.001308 sec
It can be seen that GPU is still 0.003094/0.000912=3.4 times faster than CPU, excluding the time to and fro GPU for data.
I am not sure if the statement "GPU is much faster than CPU" is accurate enough. It should be dependent which GPU and which CPU are comapred. For my case here, GPU is 3.4 times faster.
It is also noted there is a big overhead to move data to and from GPU. Therefore, if the data can stay in GPU with many computation inside the GPU, the relative overhead becomes less significant relatively.
For matric computation like mean, it is not necessary to break into struct array (not sure what is the overhead) and then apply array function.
  4 Commenti
Ho Lam YEUNG
Ho Lam YEUNG il 21 Lug 2022
Thank you so much for your help, Chunru. I don't need to do it in a stupid way anymore!
I recoded my program according your advise, and got some results below.
Moving array into GPU timing: 0.769 seconds
GPU mean timing: 0.01257 seconds
GPU SD timing: 0.04164 seconds
CPU mean timing: 0.01272 seconds
CPU SD timing: 0.04263 seconds
The time taken by CPU and GPU are quite similar (i.e. GPU is not faster in this case). Adding the time for moving the array into GPU (this is for transferring some outside data so cannot create an array inside GPU directly), it seems doing the computation in GPU doesn't worth it.
Would it be because of the weak GPU on my machine (MX250 on laptop), or there are some underlying reasons? Thank you so much for your help again.
Joss Knight
Joss Knight il 22 Lug 2022
Try rerunning your computation in single precision (a = randn(7000, 2000, 'single');). Your laptop GPU is unlikely to provide any speedup in double precision.

Accedi per commentare.

Più risposte (0)

Prodotti


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by