Contenuto principale

gpucoder.atomicMax

Atomically find the maximum between value and variable in global or shared memory

Since R2021b

    Description

    The gpucoder.atomicMax function reads a value from a global or shared GPU memory location, compares it to an operand, and writes the maximum value back to the memory location. In generated GPU code, the operation is atomic, which means that a GPU thread performs the read-modify-write operation without interference from other threads.

    A = gpucoder.atomicMax(A,B) compares B to the value of A and writes the value of max(A,B) back into A.

    Call the gpucoder.atomicMax function directly inside a for-loop that you want to execute on the GPU in the generated CUDA® code. Each iteration of the loop must be able to read and write to A.

    example

    [A,oldA] = gpucoder.atomicMax(A,B) returns the previous value of A as oldA. When you use this syntax, use the coder.gpu.kernel pragma before the loop that contains gpucoder.atomicMax.

    Examples

    collapse all

    Perform a simple atomic addition operation by using the gpucoder.atomicMax function and generate CUDA code that calls corresponding CUDA atomicMax() APIs.

    In one file, write an entry-point function myAtomicMax that accepts matrix inputs a and b.

    function a = myAtomicMax(a,b)
    coder.gpu.kernelfun;
    
    for i =1:numel(a)
        a(i) = gpucoder.atomicMax(a(i), b);
    end
    
    end
    

    To create a type for an int32 matrix for use in code generation, use the coder.newtype function.

    A = coder.newtype('int32', [1 30], [0 1]);
    B = coder.newtype('int32', [1 1], [0 0]);
    inputArgs = {A,B};
    

    To generate a CUDA library, use the codegen function.

    cfg = coder.gpuConfig('lib');
    cfg.GenerateReport = true;
    
    codegen -config cfg -args inputArgs myAtomicMax -d myAtomicMax
    

    The generated CUDA code contains the myAtomicMax_kernel1 kernel with calls to the atomicMax() CUDA APIs.

    //
    // File: myAtomicMax.cu
    //
    ...
    
    static __global__ __launch_bounds__(1024, 1) void myAtomicMax_kernel1(
        const int32_T b, const int32_T i, int32_T a_data[])
    {
      uint64_T loopEnd;
      uint64_T threadId;
    ...
    
      for (uint64_T idx{threadId}; idx <= loopEnd; idx += threadStride) {
        int32_T b_i;
        b_i = static_cast<int32_T>(idx);
        atomicMax(&a_data[b_i], b);
      }
    }
    ...
    
    void myAtomicMax(int32_T a_data[], int32_T a_size[2], int32_T b)
    {
      dim3 block;
      dim3 grid;
    ...
    
        cudaMemcpy(gpu_a_data, a_data, a_size[1] * sizeof(int32_T),
                   cudaMemcpyHostToDevice);
        myAtomicMax_kernel1<<<grid, block>>>(b, i, gpu_a_data);
        cudaMemcpy(a_data, gpu_a_data, a_size[1] * sizeof(int32_T),
                   cudaMemcpyDeviceToHost);
    ...
    
    }
    

    Input Arguments

    collapse all

    Reference to a shared or global GPU memory location, specified as a scalar. Use the reference as an input and output argument of gpucoder.atomicMax.

    Data Types: int32 | uint32 | uint64

    Operand, specified as a scalar.

    Data Types: int32 | uint32 | uint64

    Extended Capabilities

    expand all

    C/C++ Code Generation
    Generate C and C++ code using MATLAB® Coder™.

    GPU Code Generation
    Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

    Version History

    Introduced in R2021b