Maximum blocks per kernel
Maximum number of blocks created during a kernel launch
Since R2021a
Description
App Configuration Pane: GPU Code
Configuration Objects: coder.gpuConfig
The Maximum blocks per kernel parameter specifies the maximum number of blocks created during a kernel launch.
Due to limited streaming multiprocessor (SM) resources on GPU devices, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading and unloading of blocks.
If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA® kernels with striding.
When you specify the maximum number of blocks for each kernel, the code generator
creates 1-D kernels. To force the code generator to create 2-D or 3-D kernels, use the
coder.gpu.kernel
pragma. The coder.gpu.kernel
pragma
takes precedence over the maximum number of kernels for each block.
Settings
0
(default) | integer
0
GPU Coder™ limits the number of blocks per kernel to the maximum number allowed by CUDA.
- integer
Enter the maximum number of blocks you want to create during a kernel launch.
Programmatic Use
Property:
MaximumBlocksPerKernel |
Values: 0 | integer |
Default: 0 |
Version History
Introduced in R2021a