CUDA_ERROR_UNKNOWN when using floats instead of double precision

Question

0 voti

Hello,

I have a identical .cu files, one where I use variables defined as float and another where I use variables defined as double.

The double precision one works perfectly when called on by the kernel, whereas the float version does not. I get an error when I gather() the output variables:

Errror using gpuArray/gather
An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_UNKNOWN

From the documentation, it appears that the feval() function will automatically cast my input arrays to the correct type, however, I have also tried individually transforming each input/output array to float using single(), but I get a similar error.

Here is the format:

__global__ void SegForceNBodyCUDA(double const *SoA,
                    double const a, double const MU, double const NU,
                    int const S,
                    double *f0x, double *f0y, double *f0z,
                    double *f1x, double *f1y, double *f1z);

or

__global__ void SegForceNBodyCUDA(float const *SoA,
                    float const a, float const MU, float const NU,
                    int const S,
                    float *f0x, float *f0y, float *f0z,
                    float *f1x, float *f1y, float *f1z);

Both .cu files compile correctly without errors/warnings.

Please advise.

Thank you,

Francesco

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Francesco il 19 Set 2013

Modificato: Francesco il 19 Set 2013

Apri in MATLAB Online

For the original double precision code I use:

kern = parallel.gpu.CUDAKernel('SegForceNBodyCUDAOpt2.ptx', 'SegForceNBodyCUDAOpt2.cu'); kern.ThreadBlockSize = 256; kern.GridSize = ceil(S/256);

    [f0x, f0y, f0z, f1x, f1y, f1z] = feval(kern,SoA,...
                                       a,MU,NU,...
                                       S,...
                                       f0x_temp, f0y_temp, f0z_temp,...
                                       f1x_temp, f1y_temp, f1z_temp);

For the single precision I do:

    kern = parallel.gpu.CUDAKernel('SegForceNBodyCUDASingle.ptx', 'SegForceNBodyCUDASingle.cu');
    kern.ThreadBlockSize = 256;
    kern.GridSize = ceil(S/256);
       [f0x, f0y, f0z, f1x, f1y, f1z] = feval(kern,single(SoA),...
                                       single(a),single(MU),single(NU),...
                                       S,...
                                       single(f0x_temp), single(f0y_temp), single(f0z_temp),...
                                       single(f1x_temp), single(f1y_temp), single(f1z_temp));

f0x_temp and similar are created before using zeros(S,1);

Ben Tordoff il 20 Set 2013

Casting is unlikely to be the issue here - you see the error even when you have forced the types to be correct. CUDA_ERROR_UNKNOWN usually means that the card has crashed, and I've previously seen this when there is a bad memory access (e.g. off the end of an array, or an invalid pointer etc.).

Unfortunately that means that the problem may be somewhere in your kernel or in the way the output arrays are being handled in CUDAKernel. Without your code it's hard to identify which of those two it might be. If you could supply a minimal example that is sufficient to demonstrate the problem, I might be able to help work out where the problem lies. You can send it to me directly if you can't (or don't want to) post it here.

Finally, about the casting: if the input is plain MATLAB data, it is cast to the correct type and copied to the GPU. If the input is already on the GPU (i.e. it is a gpuArray), the type and complexity must exactly match and a specific error is thrown if they do not. Since you do not see this error, your inputs are being cast for you.

Cheers

Ben

Francesco il 20 Set 2013

Hi Ben,

Thank you for your reply.

The fact is that I've directly modified the original double precision code by simply declaring variables as "float" instead of "double". I haven't made any changes in the algorithm. I don't allocate any memory within the kernel file, so I am not sure why it crashes.

The file is rather long, about 1000 lines of code, separated into several functions, so it's difficult for me to whittle it down to a minimal working example.

I can send it to you and see if you can reproduce the mistake, and in the meantime also look for the error myself.

Thank you,

Kind Regards,

Francesco

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Ben Tordoff il 25 Set 2013

Apri in MATLAB Online

1 voto

Thanks for sending the code.

I’ve done some initial investigation and it looks like you have an illegal memory access somewhere. Here is what cuda-memcheck reports:

Running CUDA Single Precision, Optimised...
warning: Cuda API error detected: cuModuleGetGlobal_v2 returned (0x1f4)
warning: Cuda API error detected: cuModuleGetGlobal_v2 returned (0x1f4)
[Launch of CUDA Kernel 102 (SegForceNBodyCUDA<<<(4,1,1),(256,1,1)>>>) on Device 0]
Memcheck detected an illegal access to address (@local)0xfff830
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 102, grid 103, block (0,0,0), thread (5,0,0), device 0, sm 12, warp 2, lane 5]
0x0000000010052d98 in SegForceNBodyCUDA(float const*, float, float, float, int, float*, float*, float*, float*, float*, float*) ()

I couldn't see anything obviously wrong in the kernel, but it's quite a lot of code. The most likely culprit is reading/writing past the end of an input/output array. However, the fact that the illegal memory address includes "@local" may indicate a problem with how data is being passed around internally to the kernel (i.e. in thread-local memory). I don't think there is any problem with the way the kernel is being called by MATLAB.

Best of luck debugging this - these type of problems can be tricky to isolate.

Ben

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Francesco il 25 Set 2013

Thanks a lot Ben!

I'll try my best :)

Accedi per commentare.

CUDA_ERROR_UNKNOWN when using floats instead of double precision

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Tag

Community Treasure Hunt

CUDA_ERROR_UNKNOWN when using floats instead of double precision

5 Commenti Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Tag

Vedere anche

Community Treasure Hunt

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti