Accelerate 5G Simulation Using GPU
This example shows how to accelerate a simplified 5G NR physical layer simulation by using a graphics processing unit (GPU). The simulation consists of physical downlink shared channel (PDSCH) symbol encoding, OFDM modulation, transmission through a clustered delay line (CDL) channel, OFDM demodulation, and PDSCH symbol decoding.
Introduction
Link-level simulations require a large number of frames to provide statistically valid results, which can result in very long-running simulations. When your 5G Toolbox™ simulation uses functions that support GPU arrays, you can speed up your simulation by using a GPU. To enable this capability, call the function with at least one gpuArray
(Parallel Computing Toolbox) object as a data input argument (requires Parallel Computing Toolbox™). This example shows how to use GPU arrays in a simplified PDSCH link simulation. For an example of how to use GPU arrays in a full link-level simulation, see NR PDSCH Throughput.
Detect GPU
Use Parallel Computing Toolbox™ functions to verify that a supported GPU is available.
if canUseGPU gpuExist = true; D = gpuDevice; fprintf('Compatible GPU found: %s device, %d multiprocessors, %s compute capability.\n', ... D.Name, D.MultiprocessorCount, D.ComputeCapability); else gpuExist = false; warning(['Could not find an appropriate GPU. ' ... 'GPU-based simulation is skipped.']); end
Compatible GPU found: NVIDIA RTX 6000 Ada Generation device, 142 multiprocessors, 8.9 compute capability.
Set Up Simulation Parameters
Define carrier parameters.
carrier = nrCarrierConfig; carrier.NSizeGrid = 52; carrier.SubcarrierSpacing = 15; waveinfo = nrOFDMInfo(carrier);
Define PDSCH parameters.
pdsch = nrPDSCHConfig; pdsch.NumLayers = 4; % Define PDSCH time-frequency resource allocation per slot to be full grid (single full grid BWP) pdsch.PRBSet = 0:carrier.NSizeGrid-1; % PDSCH PRB allocation pdsch.SymbolAllocation = [0,carrier.SymbolsPerSlot]; % Starting symbol and number of symbols of each PDSCH allocation pdsch.MappingType = 'A'; % PDSCH mapping type ('A'(slot-wise),'B'(non slot-wise))
Create a channel object.
channel = nrCDLChannel; channel.DelayProfile = 'CDL-C'; channel.DelaySpread = 300e-9; channel.MaximumDopplerShift = 5; channel.SampleRate = waveinfo.SampleRate; channel.ChannelResponseOutput = 'ofdm-response';
Configure a cross-polarized 1-by-2 rectangular panel array for the receiver and a cross-polarized 2-by-4 rectangular panel array for the transmitter.
channel.TransmitAntennaArray.Size = [2 4 2 1 1]; channel.ReceiveAntennaArray.Size = [1 2 2 1 1]; nTx = prod(channel.TransmitAntennaArray.Size); % 16 nRx = prod(channel.ReceiveAntennaArray.Size); % 4
Create a data array based upon the configured PDSCH.
[pdschIndices,pdschIndicesInfo] = nrPDSCHIndices(carrier,pdsch);
data = randi([0 1],pdschIndicesInfo.G,1,'int8');
Define the signal-to-noise ratio (SNR). For an explanation of the SNR definition that this example uses, see SNR Definition Used in Link Simulations.
SNRdB = -5;
Set the number of slots for the simulation.
nSlots = 10;
Compare CPU and GPU Execution Times
Create a function handle for CPU execution.
cpuf = @()pdschLink(carrier,pdsch,pdschIndices,data,channel,nSlots,SNRdB,nTx,nRx);
Create a function handle for the GPU execution. To enable GPU execution, call the simulation loop with a gpuArray
object as the data input argument.
if gpuExist gpuData = gpuArray(data); gpuf = @()pdschLink(carrier,pdsch,pdschIndices,gpuData,channel,nSlots,SNRdB,nTx,nRx); end
Use the timeit
and gputimeit
functions to accurately measure the execution time of the simulation.
execTimeCPU = timeit(cpuf);
fprintf('The execution time on the CPU is %.3f seconds.',execTimeCPU)
The execution time on the CPU is 1.259 seconds.
Reset the channel object and measure the execution time on the GPU.
if gpuExist release(channel); execTimeGPU = gputimeit(gpuf); fprintf('The execution time on the GPU is %.3f seconds.',execTimeGPU) end
The execution time on the GPU is 0.383 seconds.
The speedup when using a GPU increases as the size of the data increases. This table shows the results of running this example with increasing data sets, such as increasing number of transmit and receive antennas (Tx and Rx) and increasing number of resource blocks (RBs).
32 Tx, 4 Rx, 52 RBs | 64 Tx, 8 Rx, 137 RBs | 128 Tx, 8 Rx, 275 RBs | |
CPU Execution Time (s) | 1.5920 | 4.5674 | 20.622 |
GPU Execution Time (s) | 0.64807 | 1.8858 | 5.8244 |
Speedup Factor | 1.6917 | 2.422 | 3.5406 |
Local Functions
function pdschLink(carrier,pdsch,pdschIndices,codedTrBlock,channel,nSlots,SNRdB,nTx,nRx) waveinfo = nrOFDMInfo(carrier); SNR = 10^(SNRdB/10); N0 = 1/sqrt(nRx*double(waveinfo.Nfft)*SNR); nVar = N0^2*double(waveinfo.Nfft); chInfo = info(channel); maxChDelay = chInfo.MaximumChannelDelay; for slot = 1:nSlots carrier.NSlot = slot; % Create a random precoding matrix for simplicity wtx = randn(pdsch.NumLayers,nTx); grid = single(zeros(carrier.NSizeGrid*12,carrier.SymbolsPerSlot,nTx,'like',codedTrBlock)); pdschSymbols = nrPDSCH(carrier,pdsch,codedTrBlock); [pdschAntSymbols,pdschAntIndices] = nrPDSCHPrecode(carrier,pdschSymbols,pdschIndices,wtx); grid(pdschAntIndices) = pdschAntSymbols; txWaveform = nrOFDMModulate(carrier,grid); txWaveform = [txWaveform; zeros(maxChDelay,size(txWaveform,2))]; %#ok<AGROW> [rxWaveform,ofdmResponse,timingOffset] = channel(txWaveform,carrier); noise = N0*randn(size(rxWaveform),"like",rxWaveform); rxWaveform = rxWaveform + noise; % Perfect timing synchronization using the timing offset from the % channel rxWaveform = rxWaveform(1+timingOffset:end,:); rxGrid = nrOFDMDemodulate(carrier,rxWaveform); % Get PDSCH resource elements from the received grid and % channel estimate [pdschRx,pdschHest,~,pdschHestIndices] = nrExtractResources(pdschIndices,rxGrid,ofdmResponse); pdschHest = nrPDSCHPrecode(carrier,pdschHest,pdschHestIndices,permute(wtx,[2 1 3])); pdschEq = nrEqualizeMMSE(pdschRx,pdschHest,nVar); % pdschEq = pdschRx; dlschLLRs = nrPDSCHDecode(carrier,pdsch,pdschEq,nVar); end end