CUDA Unexpected Error for nndata2gpu

15 visualizzazioni (ultimi 30 giorni)
LukasR
LukasR il 30 Apr 2018
Commentato: Harley Edwards il 18 Ago 2018
Hi, I am currently trying to train a fitnet on a GPU (NVIDIA Titan Xp). However, whenever I try to format my data using nndata2gpu and gpu2nndata, I run into the following error:
Error using gpuArray/gather An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS
The code used is:
% tinput=nndata2gpu(input);
ttarget=nndata2gpu(target);
fundnet=configure(fundnet,input,target);
tic
fundnet=train(fundnet,tinput,ttarget,'useGPU','yes','showResources','yes');
toc
ty=fundnet(tinput);
y=gpu2nndata(ty);
fundnet=perform(fundnet,target,y);
The device is recognized without any problems (gpuDevice loads in less than a second), drivers are up to date. Using Matlab R2018a. Any idea what could be the source of this issue?
Many thanks in advance!
  2 Commenti
Joss Knight
Joss Knight il 2 Mag 2018
This doesn't look good. Could you provide a standalone example - i.e. generate some data that triggers the error and include it in your code?
LukasR
LukasR il 2 Mag 2018
Modificato: LukasR il 2 Mag 2018
Many thanks for the answer. While generating the sample data, I found what I believe is the source of the issue: For
input=rand(20,6000000);
target=rand(1,6000000);
the error occurs while for
input=rand(20,2000000);
target=rand(1,2000000);
it doesn't. The size of my own dataset amounts to approx. 5400000x25 (inputs) and 5400000x1 (targets).
Here is the entire executable code (which triggers the error):
input=rand(20,6000000);
target=rand(1,6000000);
nneurons=10;
technet=fitnet(nneurons,'trainscg');
technet.trainParam.epochs=10000;
technet.trainParam.goal=0;
technet.trainParam.min_grad=1e-6;
technet.trainParam.max_fail=200;
technet.trainParam.sigma=5.0e-7;
technet.trainParam.lambda=5.0e-7;
technet.trainParam.show=25;
technet.trainParam.showCommandLine=false;
technet.trainParam.showWindow=true;
technet.trainParam.time=inf;
technet.divideParam.trainRatio = 70/100;
technet.divideParam.valRatio = 15/100;
technet.divideParam.testRatio = 15/100;
for i=1:technet.numLayers
if strcmp(technet.layers{i}.transferFcn,'tansig')
technet.layers{i}.transferFcn = 'elliotsig';
end
end
tinput=nndata2gpu(input);
ttarget=nndata2gpu(target);
technet=configure(technet,input,target);
tic
technet=train(technet,tinput,ttarget,'useGPU','yes','showResources','yes');
toc
ty=technet(tinput);
technetout=gpu2nndata(ty);
technetperformance=perform(technet,target,technetout);
Another note: The GPU training DOES work normally without the nndata2gpu command, albeit quite disappointingly (only a 1.5x speedup compared to an i7-7500U for the dataset described above). Furthermore, after the error occurs once, it will also occur for smaller datasets until I restart the whole program (in fact, I am not able to create any gpuArrays before restarting MATLAB).

Accedi per commentare.

Risposta accettata

Joss Knight
Joss Knight il 2 Mag 2018
Looks like you found a bug, many thanks. We will investigate. Meanwhile, best guess for now, this is caused by using more data than the GPU train function can handle. If you can reduce the size of the input without compromising your application, then that is the work-around.
  1 Commento
Harley Edwards
Harley Edwards il 18 Ago 2018
I think I found a similar/related error. I have a data set in which all the data trains well separately but will not together, despite having sufficient memory, and turning off kernel execution timeout. I have Inputs 200X844000, and Targets of 6X844000. I can only train 325000 samples at a time on a Geforce 1080. Please let me know how I can contribute to solving this problem, if you want my code.

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by