svds performance on GPU is upto 10 times slower than on CPU

7 visualizzazioni (ultimi 30 giorni)
Hello everyone,
I am trying to get a complicated algorithm run on my GPU at the moment. It involves a lot of fft, ifft and pointwise multiplications so I thought it would be a good idea. However, it also involves the calculation of the first left singular vector in each Iteration. Although svds Supports gpuArray Inputs, it seems to be extremly slow on GPU. Maybe this is due to my System, maybe it is because I did some silly mistake (this is my first time I try to use GPU). When I run the Code
X = rand(1024);
Y = gpuArray(X);
f = @() svds(X,1);
g = @() svds(Y,1);
t = timeit(f,3);
gt = gputimeit(g,3);
disp([t,gt]);
the gpu function is always 5 upto 10 times slower than the CPU version. This is annoying because I would even be happy if it is about the same. The rest of the algorithm is much faster on GPU right now but the svds ruins everything again. Here is the output of gpuDevice, if this helps:
Name: 'Quadro M2000'
Index: 1
ComputeCapability: '5.2'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 4.2950e+09
AvailableMemory: 3.4038e+09
MultiprocessorCount: 6
ClockRateKHz: 1162500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Matlab version is 2017b running on 64bit Windows 10, i7-7700K CPU (4.2Ghz) and 32GB RAM.

Risposta accettata

Heiko Weichelt
Heiko Weichelt il 11 Mag 2018
Hi Florian
Thanks for asking this question.
GPUs are only faster than CPUs if you can keep lots of threads running, and this always requires operating on large arrays. Different operations start to be faster on GPU than CPU at different array sizes for different GPUs. For SVDS the nature of the code requires particularly large arrays, with millions of elements. In the attached plot, you can see that the GPU will be faster given matrices of size 2000x2000 or more. It looks like for your GPU the data just isn’t big enough to get a benefit
I've repeated your example for various sizes and the result looks as following:
The exact threshold depends on the used computer and GPU. Please find attached the live script that I used to create the picture.
If you have any further questions, feel free to get back to me.
Best, Heiko

Più risposte (0)

Tag

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by