Run MEX Functions Containing CUDA Code
Write MEX File Containing CUDA Code
All MEX files, including those containing CUDA® code, have a single entry point known as
mexFunction
. The MEX function contains the host-side code
that interacts with gpuArray
objects from MATLAB® and launches the CUDA code. The CUDA code in the MEX file must conform to the CUDA runtime API.
You must call the mxInitGPU
function at the entry to
your MEX file to ensure that the GPU device is properly initialized and known to
MATLAB.
The interface you use to write a MEX file for gpuArray
objects is different
from the MEX interface for standard MATLAB arrays.
You can see an example of a MEX file containing CUDA code here:
The file contains this CUDA device function:
void __global__ TimesTwo(double const * const A, double * const B, int const N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) B[i] = 2.0 * A[i]; }
The file also contains these lines, which determine the array size and launch a grid of the proper size:
N = (int)(mxGPUGetNumberOfElements(A)); blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; TimesTwo<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, N);
Run Resulting MEX Functions
The MEX function in this example multiplies every element in the input array by 2 to get the
values in the output array. To test the function, start with a
gpuArray
matrix in which every element is 1:
x = ones(4,4,"gpuArray");
y = mexGPUExample(x)
y = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
The input and output arrays are gpuArray
objects.
Compare to a CUDA Kernel
Parallel Computing Toolbox™ software also supports CUDAKernel
objects, which you can use to integrate CUDA code with MATLAB. You can create CUDAKernel
objects using CU and PTX
files. Generally, using MEX files is more flexible than using
CUDAKernel
objects because:
MEX files can include calls to host-side libraries, including NVIDIA® libraries such as the NVIDIA performance primitives (NPP) or cuFFT libraries. MEX files can also contain calls from the host to functions in the CUDA runtime library.
MEX files can analyze the size of the input and allocate memory of a different size, or launch grids of a different size, from C or C++ code. In contrast, MATLAB code that calls
CUDAKernel
objects must preallocate output memory and determine the grid size.
Access Complex Data
Complex data on a GPU device is stored in interleaved complex format. That is, for a complex
gpuArray
A
, the real and imaginary parts of each element are stored in
consecutive addresses. MATLAB uses CUDA built-in vector types to store complex data on the device. For more
information, see the NVIDIA
CUDA C++ Programming Guide.
Depending on the needs of your kernel, you can cast the pointer to complex data as the real type or as the built-in vector type. For example, in MATLAB, suppose you create this matrix:
a = complex(ones(4,"gpuArray"),ones(4,"gpuArray"));
If you pass a gpuArray
to a MEX function as the first argument
prhs[0]
, then you can get a pointer to the complex data by
using these calls:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_complex = mxGPUGetNumberOfElements(A); double2 * d_A = (double2 const *)(mxGPUGetDataReadOnly(A));
To treat the array as a real, double-precision array of twice the length, use these calls:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_real = 2*mxGPUGetNumberOfElements(A); double * d_A = (double const *)(mxGPUGetDataReadOnly(A));
You can convert data between complex and real formats on the GPU using these Parallel Computing Toolbox functions. These operations require a copy to interleave the data.
The
mxGPUCreateComplexGPUArray
function creates a complexmxGPUArray
from two realmxGPUArray
objects that specify the real and imaginary components.The
mxGPUCopyReal
andmxGPUCopyImag
functions copy the real or the imaginary elements, respectively, of anmxGPUArray
to a single realmxGPUArray
.
The mxGetImagData
function has no equivalent for
mxGPUArray
objects.
Compile GPU MEX File
Use the mexcuda
function in MATLAB to compile a MEX file containing the CUDA code. By default, the mexcuda
function compiles the
CUDA code using the NVIDIA
CUDA compiler (nvcc
) installed with MATLAB. The software forwards further compilation steps to a C++ host
compiler installed on your system. To check which compilers
mexcuda
is using, use the -v
flag for
verbose output in the mexcuda
function.
mexcuda mexGPUExample.cu
If mexcuda
cannot locate nvcc
, it might be
installed in a nondefault location. You can specify the location of
nvcc
on your system by storing it in the
MW_NVCC_PATH
environment variable. You can set this variable
using the setenv
command. For
example:
setenv("MW_NVCC_PATH","/usr/local/CUDA/bin")
Supported Host Compilers
To compile a MEX file using the mexcuda
function, you must
have a supported C++ host compiler installed. mexcuda
only
supports a subset of Visual Studio® compilers. To determine whether your compiler is supported, follow
these steps:
Determine which version of CUDA your version of MATLAB uses by consulting the table in Install CUDA Toolkit (Optional).
Consult the NVIDIA CUDA Toolkit Documentation corresponding to the CUDA version determined in step 1. The documentation lists the supported compilers in the installation guide section.
Install CUDA Toolkit (Optional)
The CUDA Toolkit installed with MATLAB does not contain all libraries that are available in the CUDA Toolkit. If you want to use a specific library that is not installed with MATLAB, install the CUDA Toolkit.
Note
You do not need the CUDA Toolkit to run MATLAB functions on a GPU or to generate CUDA-enabled MEX functions.
The CUDA Toolkit contains CUDA libraries and tools for compilation.
Download the appropriate CUDA toolkit version for the version of MATLAB you are using. Check which version of the toolkit is compatible with your version of MATLAB using this table. Recommended best practice is to use the latest version of your supported CUDA Toolkit, including any updates and patches from NVIDIA.
MATLAB Release | CUDA Toolkit Version |
---|---|
R2024b | 12.2 |
R2024a | 12.2 |
R2023b | 11.8 |
R2023a | 11.8 |
R2022b | 11.2 |
R2022a | 11.2 |
R2021b | 11.0 |
R2021a | 11.0 |
R2020b | 10.2 |
R2020a | 10.1 |
R2019b | 10.1 |
R2019a | 10.0 |
R2018b | 9.1 |
R2018a | 9.0 |
R2017b | 8.0 |
R2017a | 8.0 |
R2016b | 7.5 |
R2016a | 7.5 |
R2015b | 7.0 |
R2015a | 6.5 |
R2014b | 6.0 |
R2014a | 5.5 |
R2013b | 5.0 |
R2013a | 5.0 |
R2012b | 4.2 |
R2012a | 4.0 |
R2011b | 4.0 |
For more information about the CUDA Toolkit and to download your supported version, see CUDA Toolkit Archive (NVIDIA).
See Also
mexcuda
| CUDAKernel
| mex