Why do I receive the "CUDA_ERRO​R_LAUNCH_T​IMEOUT" error when trying to run GPU code with Parallel Computing Toolbox?

25 visualizzazioni (ultimi 30 giorni)
I am trying to run my computation on the GPU. When I execute my program I receive the following error message:
ERROR: Warning: An unexpected error occurred during CUDA execution. The CUDA error
was: CUDA_ERROR_LAUNCH_TIMEOUT.
Error using arrayfun
The kernel execution failed because the CUDA driver timeout was encountered.

Risposta accettata

MathWorks Support Team
MathWorks Support Team il 30 Gen 2024
Modificato: MathWorks Support Team il 30 Gen 2024
This is a limitation imposed on the Parallel Computing Toolbox by the underlying operating system. 
This error occurs when a gpuArray operation or a CUDA kernel runs for a long time on a GPU that is used for both graphics rendering and CUDA computations. The error is triggered by the operating system, which limits the time that the GPU can dedicate to computations instead of updating the display. Computations that exceed this time limit trigger the Timeout Detection and Recovery (TDR) mechanism. 
There are several ways to avoid this error: 

1. Use different GPUs for graphics and computation 

The time limit only applies to GPUs that are used for graphics. It is therefore recommended that you run gpuArray operations and CUDA kernels on a GPU that is not providing output for a display. 
On Windows, to ensure that your GPU is never used for display, set your GPU to use the Tesla Compute Cluster (TCC) driver model. To see which driver model your GPU device is using, inspect the DriverModel property returned by the gpuDevice function in MATLAB. Not all NVIDIA GPUs support the TCC driver model. 

2. Segment your computation into smaller chunks 

If possible, split your large computations into several, smaller computations. Smaller computations are less likely to trigger the TDR mechanism. 

3. Modify the TDR mechanism (Windows) 

If your system has GPUs without a display attached, you can manually modify the TDR mechanism. Increase the time limit to allow GPU computations to run for extended periods of time without triggering an error. 
For information on TDR and how to modify the timeout on Windows, see https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys.  

Più risposte (0)

Tag

Non è stata ancora inserito alcun tag.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by