- MATLAB uses lazy evaluation to schedule the operations on your GPU, which introduces some asynchronicity in the GPU's behaviour. The same mechanism is not used on the CPU.
- gputimeit takes lazy evaluation into consideration and also repeats the measure several times, weighing caching effects, overheads, and first-time costs.
- timeit also repeats the measure several times, but it doesn't take lazy evaluation into consideration.
- tic/toc neither repeat the measure nor takes lazy evaluation into consideration.
- the profiler is somewhat similar to tic/toc but it also introduces some overhead in the measurement because it has to trace the whole call stack (which is why is useful for investigating rather than extracting rigorous measurements).
Running Code on GPU Seems much Slower than Doing so on CPU
41 views (last 30 days)
Theron FARRELL on 27 May 2019
I am using a Thinkpad W550, and my GPU is Quadro K620M. As I simply ran the following code, the profile showed that running on the GPU was much slower.
a = [10^8, 18^8];
h = a;
c = conv2(h, a, 'full');
% Running in doube precision got a similar result
aa = single(gpuArray([10^8, 18^8]));
hh = aa;
cc = conv2(hh, aa, 'full');
So I ran the official gpuBench()
The result is astonishing! Running on the GPU IS slower, much much more slower.
The first picture shows the result from GPU, and the second, CPU.
I will be very grateful if anyone could tell me why. Many thanks
Andrea Picciau on 29 May 2019
Edited: Andrea Picciau on 29 May 2019
You don't need to disable JIT acceleration. Rather, you need to measure using timeit and gputimeit like so:
% CPU data
a = ones([100, 100], 'single');
h = a;
% GPU data
aa = gpuArray(a);
hh = gpuArray(h);
% Measuring CONV2 with one output
cpuTime = timeit(@() conv2(h, a, 'full'), 1);
gpuTime = gputimeit(@() conv2(hh, aa, 'full'), 1);
Why you might want to do this:
What results do you get? Let us know.
Given your setup, it woudln't be strange if gpuTime>cpuTime. Laptop GPUs are usually not optimized for computing, and it might be the case that yours is driving the graphics too.
More Answers (2)
Walter Roberson on 27 May 2019
The Quadro 620M was a Maxwell architecture, GM108 chip. That architecture does double precision at 1/32 of single precision.
MTimes operations are delegated to LAPACK by MATLAB for sufficiently large arrays. LAPACK automatically uses all available CPU cores.
My CPU shows up as faster for double precision MTIMES and backslach than my GTX 780M does, but the GPU was much faster for single precision, and is faster for double precision FFT than my CPU measures as.