Is it possible to use cuRAND with feval (Parallel computing toolbox)?
Mostra commenti meno recenti
Hi,
I am trying to call feval instruction (Parallel Computing toolbox) with a kernel which uses the cuRAND library (<http://developer.nvidia.com/curand)>, and I need to pass to feval an argument of type curandState (needed to initialize random generators in cuRAND).
I have something similar to:
K=parallel.gpu.CUDAKernel('kernel.ptx','kernel.cu');
[arg_out]=feval(K,arg_in, state);
"state" must be a curandState variable.
I tried cheating MATLAB with:
[arg_out]=feval(K,arg_in, 1);
But I got the following error message:
_Error using iParseToken (line 259) Unsupported type in argument specification "curandState * state".
Error in C:\Program Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\handleKernelArgs.p>iParseCPrototype (line 181)
Error in C:\Program Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\handleKernelArgs.p>handleKernelArgs (line 70)_
I have not found any information in google. Could anyone please help me?
Thank you in advance.
María.
Risposta accettata
Più risposte (2)
Edric Ellis
il 1 Feb 2012
For what it's worth, I have some example CUDA code and MATLAB driving code to show how one might use CURAND. First off, here's the CUDA code:
#include <curand_kernel.h>
const size_t stateSize = sizeof( curandState );
__device__ void copyState( void * out, void const * in ) {
unsigned char * outc = static_cast< unsigned char * >( out );
unsigned char const * inc = static_cast< unsigned char const * >( in );
for ( int i = 0; i < stateSize; ++ i ) {
outc[i] = inc[i];
}
}
__global__ void returnStateSize( unsigned int * value ) {
value[0] = stateSize;
}
__global__ void initState( unsigned char * stateArray ) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
curandState state;
curand_init( 1234, idx, 0, &state );
copyState( stateArray + idx * stateSize, &state );
}
__global__ void generate( double * x, unsigned char * stateArray ) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
curandState state;
copyState( &state, stateArray + idx * stateSize );
x[idx] = curand_uniform_double( &state );
copyState( stateArray + idx * stateSize, &state );
}
And here's some MATLAB code which uses that:
import parallel.gpu.GPUArray;
% Get the number of bytes per thread of state.
stateSizeK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'returnStateSize' );
stateSz = double( gather( feval( stateSizeK, zeros( 'uint32' ) ) ) );
% Set up the random state
initK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'initState' );
initK.ThreadBlockSize = 256;
initK.GridSize = 10;
randState = feval( initK, GPUArray.zeros( stateSz, 256*10, 'uint8' ) );
genK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'generate' );
genK.ThreadBlockSize = 256;
genK.GridSize = 10;
% Generate some random numbers
[rand1, randState] = feval( genK, GPUArray.zeros(1, 256*10), randState );
1 Commento
María
il 1 Feb 2012
María
il 31 Gen 2012
0 voti
Categorie
Scopri di più su Code Performance in Centro assistenza e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!