Performance

Troubleshoot code generation issues, improve code execution time, and reduce memory usage of generated code

Some of the most common reasons why GPU Coder™ generated code is not performing as expected are:

CUDA^® kernels are not created.
Host to device and device to host memory transfers (cudaMemcpy) are throttling performance.
Not enough parallelism or device issues.

These topics elaborate on the common causes for these symptoms and describe how to utilize the built-in screener to detect these issues. You can find information on how to work around for these issues and generate more efficient CUDA code.

Apps

expand all

GPU Coder

GPU Coder	Generate CUDA code from MATLAB code
GPU Environment Check	Verify and set up GPU code generation environment

Functions

expand all

Code Generation

`codegen`	Generate C/C++ code from MATLAB code
`gpucoder`	Open GPU Coder app
`gpuPerformanceAnalyzer`	Analyze and optimize performance of the generated code (Since R2023a)
`gpuprofile`	Profile execution time for generated CUDA code (Since R2024a)

Programming for Code Generation

`coder.gpu.kernel`	Pragma that maps `for`-loops to GPU kernels
`coder.gpu.kernelfun`	Pragma that maps function to GPU kernels
`coder.gpu.nokernel`	Pragma to disable kernel creation for loops

Objects

expand all

Code configuration

`coder.gpuConfig`	Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder
`coder.CodeConfig`	Configuration parameters for C/C++ code generation from MATLAB code
`coder.EmbeddedCodeConfig`	Configuration parameters for C/C++ code generation from MATLAB code with Embedded Coder
`coder.gpuEnvConfig`	Configuration object for checking the GPU code generation environment

Topics

Code Generation Reports
Create and view reports generated during code generation.
Trace Between Generated CUDA Code and MATLAB Source Code
Highlight sections of MATLAB^® code that runs on the GPU.
Generating a GPU Code Metrics Report for Code Generated from MATLAB Code
Create and explore GPU static code metrics report.
GPU Performance Analyzer
Visualize code metrics and identify optimization and tuning opportunities in your code.
Analyzing Network Performance Using the Deep Learning Dashboard
Investigate the performance of deep learning networks and layers in generated code using the Deep Learning Dashboard. (Since R2025a)
Kernel Analysis
Recommendations for generating efficient CUDA kernels.
Memory Bottleneck Analysis
Reduce memory bottleneck issues when using GPU Coder.
Optimize Kernels That Contain Loops
Rewrite loops in MATLAB to avoid generated code kernels that contain loops. (Since R2025a)
Prevent Kernel Launches Inside Loops
Parallelize loops that launch kernels to execute them on the GPU. (Since R2025a)
Minimize Memory Copy Events in Generated Code Loops
Rewrite loops to minimize the number of data transfers between the CPU and GPU in generated CUDA code. (Since R2025a)

Featured Examples

Pass GPU Inputs to Entry-Point Functions

Generate code that receives data from the GPU to avoid unnecessary memory copies.

Since R2024a
Open Live Script

Profile Generated CUDA MEX Functions Using Performance Analyzer

Visualize code metrics and identify optimization and tuning opportunities in generated CUDA MEX.

Since R2024a
Open Live Script

Analyze Performance of Generated CUDA Code

Analyze and optimize the performance of generated CUDA® code by using the gpuPerformanceAnalyzer function.

Open Live Script

GPU Profiling on NVIDIA Jetson Platforms

Analyze and optimize the performance of the generated CUDA code on the Jetson™ platform.

Open Live Script

Analyze Performance of Code Generated for Deep Learning Networks

Analyze the performance of the generated CUDA code for deep learning networks.

Open Live Script