Accelerate 5G Simulation Using GPU

Since R2025a

This example uses:

This example shows how to accelerate a simplified 5G NR physical layer simulation by using a graphics processing unit (GPU). The simulation consists of physical downlink shared channel (PDSCH) symbol encoding, OFDM modulation, transmission through a clustered delay line (CDL) channel, OFDM demodulation, and PDSCH symbol decoding.

Introduction

Link-level simulations require a large number of frames to provide statistically valid results, which can result in very long-running simulations. When your 5G Toolbox™ simulation uses functions that support GPU arrays, you can speed up your simulation by using a GPU. To enable this capability, call the function with at least one gpuArray (Parallel Computing Toolbox) object as a data input argument (requires Parallel Computing Toolbox™). This example shows how to use GPU arrays in a simplified PDSCH link simulation. For an example of how to use GPU arrays in a full link-level simulation, see NR PDSCH Throughput.

Detect GPU

Use Parallel Computing Toolbox™ functions to verify that a supported GPU is available.

if canUseGPU
    gpuExist = true;
    D = gpuDevice;
    fprintf('Compatible GPU found: %s device, %d multiprocessors, %s compute capability.\n', ...
        D.Name, D.MultiprocessorCount, D.ComputeCapability);
else
    gpuExist = false;
    warning(['Could not find an appropriate GPU. ' ...
        'GPU-based simulation is skipped.']);
end

Compatible GPU found: NVIDIA RTX 6000 Ada Generation device, 142 multiprocessors, 8.9 compute capability.

Set Up Simulation Parameters

Define carrier parameters.

carrier = nrCarrierConfig;
carrier.NSizeGrid = 52;
carrier.SubcarrierSpacing = 15;
waveinfo = nrOFDMInfo(carrier);

Define PDSCH parameters.

pdsch = nrPDSCHConfig;
pdsch.NumLayers = 4;
% Define PDSCH time-frequency resource allocation per slot to be full grid (single full grid BWP)
pdsch.PRBSet = 0:carrier.NSizeGrid-1;                 % PDSCH PRB allocation
pdsch.SymbolAllocation = [0,carrier.SymbolsPerSlot];  % Starting symbol and number of symbols of each PDSCH allocation
pdsch.MappingType = 'A';                              % PDSCH mapping type ('A'(slot-wise),'B'(non slot-wise))

Create a channel object.

channel = nrCDLChannel;
channel.DelayProfile = 'CDL-C';
channel.DelaySpread = 300e-9;
channel.MaximumDopplerShift = 5;
channel.SampleRate = waveinfo.SampleRate;
channel.ChannelResponseOutput = 'ofdm-response';

Configure a cross-polarized 1-by-2 rectangular panel array for the receiver and a cross-polarized 2-by-4 rectangular panel array for the transmitter.

channel.TransmitAntennaArray.Size = [2 4 2 1 1];
channel.ReceiveAntennaArray.Size = [1 2 2 1 1];
nTx = prod(channel.TransmitAntennaArray.Size); % 16
nRx = prod(channel.ReceiveAntennaArray.Size);  % 4

Create a data array based upon the configured PDSCH.

[pdschIndices,pdschIndicesInfo] = nrPDSCHIndices(carrier,pdsch);
data = randi([0 1],pdschIndicesInfo.G,1,'int8');

Define the signal-to-noise ratio (SNR). For an explanation of the SNR definition that this example uses, see SNR Definition Used in Link Simulations.

SNRdB = -5;

Set the number of slots for the simulation.

nSlots = 10;

Compare CPU and GPU Execution Times

Create a function handle for CPU execution.

cpuf = @()pdschLink(carrier,pdsch,pdschIndices,data,channel,nSlots,SNRdB,nTx,nRx);

Create a function handle for the GPU execution. To enable GPU execution, call the simulation loop with a gpuArray object as the data input argument.

if gpuExist
    gpuData = gpuArray(data);
    gpuf = @()pdschLink(carrier,pdsch,pdschIndices,gpuData,channel,nSlots,SNRdB,nTx,nRx);
end

Use the timeit and gputimeit functions to accurately measure the execution time of the simulation.

execTimeCPU = timeit(cpuf);
fprintf('The execution time on the CPU is %.3f seconds.',execTimeCPU)

The execution time on the CPU is 1.259 seconds.

Reset the channel object and measure the execution time on the GPU.

if gpuExist
    release(channel);
    execTimeGPU = gputimeit(gpuf);
    fprintf('The execution time on the GPU is %.3f seconds.',execTimeGPU)
end

The execution time on the GPU is 0.383 seconds.

The speedup when using a GPU increases as the size of the data increases. This table shows the results of running this example with increasing data sets, such as increasing number of transmit and receive antennas (Tx and Rx) and increasing number of resource blocks (RBs).

	32 Tx, 4 Rx, 52 RBs	64 Tx, 8 Rx, 137 RBs	128 Tx, 8 Rx, 275 RBs
CPU Execution Time (s)	1.5920	4.5674	20.622
GPU Execution Time (s)	0.64807	1.8858	5.8244
Speedup Factor	1.6917	2.422	3.5406

Local Functions

function pdschLink(carrier,pdsch,pdschIndices,codedTrBlock,channel,nSlots,SNRdB,nTx,nRx)
    waveinfo = nrOFDMInfo(carrier);
    SNR = 10^(SNRdB/10);
    N0 = 1/sqrt(nRx*double(waveinfo.Nfft)*SNR);
    nVar = N0^2*double(waveinfo.Nfft);
    chInfo = info(channel);
    maxChDelay = chInfo.MaximumChannelDelay;
    for slot = 1:nSlots
        carrier.NSlot = slot;
        
        % Create a random precoding matrix for simplicity
        wtx = randn(pdsch.NumLayers,nTx);

        grid = single(zeros(carrier.NSizeGrid*12,carrier.SymbolsPerSlot,nTx,'like',codedTrBlock));
    
        pdschSymbols = nrPDSCH(carrier,pdsch,codedTrBlock);
        [pdschAntSymbols,pdschAntIndices] = nrPDSCHPrecode(carrier,pdschSymbols,pdschIndices,wtx);
        grid(pdschAntIndices) = pdschAntSymbols;
    
        txWaveform = nrOFDMModulate(carrier,grid);
        txWaveform = [txWaveform; zeros(maxChDelay,size(txWaveform,2))]; %#ok<AGROW>
    
        [rxWaveform,ofdmResponse,timingOffset] = channel(txWaveform,carrier);
        noise = N0*randn(size(rxWaveform),"like",rxWaveform);
        rxWaveform = rxWaveform + noise;
    
        % Perfect timing synchronization using the timing offset from the
        % channel
        rxWaveform = rxWaveform(1+timingOffset:end,:);
    
        rxGrid = nrOFDMDemodulate(carrier,rxWaveform);
    
        % Get PDSCH resource elements from the received grid and
        % channel estimate
        [pdschRx,pdschHest,~,pdschHestIndices] = nrExtractResources(pdschIndices,rxGrid,ofdmResponse);
        pdschHest = nrPDSCHPrecode(carrier,pdschHest,pdschHestIndices,permute(wtx,[2 1 3]));
        pdschEq = nrEqualizeMMSE(pdschRx,pdschHest,nVar);
        % pdschEq = pdschRx;
    
        dlschLLRs = nrPDSCHDecode(carrier,pdsch,pdschEq,nVar);
    end
end