n = 1000;
m = 500;
v = zeros(1,n);
parpool('local');
tic;
parfor i=1:n
if mod(i,16)==0
gpuDevice(1);
A = rand(1,m,'gpuArray');
B = rand(m,1,'gpuArray');
v(i) = A*B;
elseif mod(i,16)==1
gpuDevice(2);
A = rand(1,m,'gpuArray');
B = rand(m,1,'gpuArray');
v(i) = A*B;
elseif mod(i,16)==2
gpuDevice(3);
A = rand(1,m,'gpuArray');
B = rand(m,1,'gpuArray');
v(i) = A*B;
elseif mod(i,16)==3
gpuDevice(4);
A = rand(1,m,'gpuArray');
B = rand(m,1,'gpuArray');
v(i) = A*B;
else
A = rand(1,m);
B = rand(m,1);
v(i) = A*B;
end
end
toc
delete(gcp('nocreate'));
Result:CPU → 0.39sec、CPU+GPU→50sec
GPU使用量が増えたためGPU並列計算が可能となったが、何故か実行時間がCPUよりも劇的に増えた。
Parallel computation by combined use of multi-core CPU and multi-GPU can be done because of the increased GPU usage, but processing time of CPU+GPU is more dramatically long than the only multi-core CPU