Solving sparse matrix on GPU and memory problems (even with free memory available)
Mostra commenti meno recenti
Hi there,
I am running a backslash computation A\b where A is an array of type sparse and b is a vector, both stored as gpuArrays. It works fine for small matrices, but the following error is given for a matrix A of order 2e5:
_ Error using \ The GPU failed to allocate memory. To continue, reset the GPU by running 'gpuDevice(1)'. If this problem persists, partition your computations into smaller pieces._
KGR_gpu=gpuArray(KGR);
FR_gpu=gpuArray(FR);
sol=KGR_gpu\FR_gpu;
This same computation works fine when gpuArrays are not used. In fact, the size of A is only 50Mb and the GPU is a Nvidia GTX 1050, 4Gb. Following is the result of a gpuDevice call after this error:
CUDADevice with properties:
Name: 'GeForce GTX 1050'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 9.2000
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 4.2950e+09
AvailableMemory: 3.3949e+09
MultiprocessorCount: 5
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
It seems that this is an issue with the backslash operator since both A and b can be stored on the GPU without problems.
Any thoughts on how to solve large sparse systems on the GPU? Thanks!
Regards, Paulo
Risposte (2)
Walter Roberson
il 21 Mag 2018
0 voti
The result of \ between two sparse arrays is generally a dense array. For example sprand(1000,1000,.01) \ sprand(1000,1000,.01) gave me a result with fill fraction of 0.997031
2 Commenti
Paulo Ribeiro
il 21 Mag 2018
Walter Roberson
il 21 Mag 2018
>> t = sprand(1000,1000,.01) \ sprand(1000,1,.01); nnz(t)./numel(t)
ans =
1
>> whos t
Name Size Bytes Class Attributes
t 1000x1 16016 double sparse
Notice this is twice the storage size that would be required for a non-sparse array with the same number of elements, due to the overhead of storing sparse arrays.
Joss Knight
il 21 Mag 2018
Hi Paulo. I must admit, I'm not extremely familiar with the behaviour of the matrix factorization we use to implement the sparse direct solve; however it wouldn't surprise me if the result is quite dense. It might be interesting to (on the CPU) look at the density of the QR factors using the qr function, for your particular input. Certainly, when I did it on a random matrix with 10% fill, the Q factor was nearly completely dense and R factor was 50%.
>> A = sprand(1000, 1000, 0.1);
>> [Q,R] = qr(A);
>> nnz(Q)/numel(Q)
ans =
0.9903
>> nnz(R)/numel(R)
ans =
0.5005
For LU, both factors are 50% dense.
Obviously random sparse matrices don't properly reflect the structure of real sparse matrices, so your problem would be different. But it's not unreasonable to surmise that the intermediate factors might be very large.
To circumvent such problems a normal approach would be to use an iterative solver like gmres, bicg, pcg, cgs, lsqr etc. It is not uncommon for these to converge quicker than the direct solve can, especially if you can give them a good preconditioner.
9 Commenti
Paulo Ribeiro
il 22 Mag 2018
Modificato: Paulo Ribeiro
il 22 Mag 2018
Joss Knight
il 23 Mag 2018
That's a bug! Thanks for finding that. This is hit if the RHS vector is all zeros, which is a pathological case, so with a real RHS you'll avoid this.
Paulo Ribeiro
il 23 Mag 2018
Modificato: Paulo Ribeiro
il 3 Giu 2018
Joss Knight
il 2 Giu 2018
I'm glad you're making progress but it doesn't look as though you're doing anything on the GPU now. This code:
KGR_gpu=gpuArray(sparse(n,n)) % where n is the order of KGR_gpu
KGR_gpu=KGR; % KGR on the CPU is stored on GPU
may allocate a GPU array, but it then immediately frees it and overwrites the variable with the original KGR variable on the CPU. After this code, KGR_gpu and KGR are the same CPU array and the computation will happen on the CPU in both cases.
Paulo Ribeiro
il 3 Giu 2018
Paulo Ribeiro
il 3 Giu 2018
Modificato: Paulo Ribeiro
il 3 Giu 2018
Joss Knight
il 5 Giu 2018
Good to know this is working out for you. If performance is an issue for you you should keep experimenting with the supported solvers: gmres, pcg, bicg, bicgstab, cgs, lsqr. Each will have different properties and one may converge faster for your problem than another.
Another trick is to pass your system matrix directly as the preconditioner:
pcg(KGR_gpu, FR, 1e-5, 1e5, KGR_gpu);
The GPU solvers (currently) use ILU to factorize the matrix and attempt to precondition the system. Sometimes this makes no difference but often it can dramatically reduce the number of iterations to convergence.
Paulo Ribeiro
il 12 Giu 2018
Walter Roberson
il 12 Giu 2018
I seem to recall that some memory is required to marshal the data on the GPU, which uses arrays in a different order than MATLAB uses. I do not recall the details at the moment, but potentially you might not be able to create an output array larger than half of your GPU memory.
Categorie
Scopri di più su GPU Computing in MATLAB in Centro assistenza e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!