Generate Random Numbers on a GPU
This example shows how to switch between the different random number generators that are supported on the GPU.
Random numbers form a key part of many simulation or estimation algorithms. You can generate random numbers in MATLAB® using the rand, randi, and randn functions. These functions can use one of several different number generation algorithms.
Inspect GPU Random Number Generators
Use the parallel.gpu.RandStream.list function to see a short description of the available generators.
parallel.gpu.RandStream.list
The following random number generator algorithms are available: MRG32K3A: Combined multiple recursive generator (supports parallel streams) Philox4x32_10: Philox 4x32 generator with 10 rounds (supports parallel streams) Threefry4x64_20: Threefry 4x64 generator with 20 rounds (supports parallel streams)
Each of these generators has been designed with parallel use in mind, providing multiple independent streams of random numbers. However, they each have some advantages and disadvantages:
CombRecursive (also known as
MRG32k3a): This generator was introduced in 1999 and has been widely tested and used.Philox (also known as Philox4x32_10): This generator was introduced in 2011 and is specifically designed for high performance in highly parallel systems such as GPUs.
Threefry (also known as Threefry4x64_20): This generator was introduced in 2011 and is based on the cryptographic ThreeFish algorithm, which is widely tested and used. This generator was designed to give good performance in highly parallel systems such as GPUs. This is the default generator for GPU calculations.
The three generators available on the GPU are also available for use on the CPU in MATLAB. The generators have the same name and produce identical results given the same initial state. This is useful when you want to produce the same sets of random numbers on both the GPU and the CPU. For more information, see Random Number Streams on a GPU.
All of these generators pass the standard TestU01 test suite [1].
Change Default Random Number Generator
Check whether a GPU is available.
gpu = gpuDevice;
disp(gpu.Name + " GPU detected and available.")NVIDIA RTX A5000 GPU detected and available.
The gpurng function can store and reset the generator state for the GPU. You can also use gpurng to switch between the different generators. Before you change the generator, store the existing state so that it can be restored at the end of these tests.
oldState = gpurng;
gpurng(0,"Philox");
disp(gpurng) Type: 'philox'
Seed: 0
State: [7×1 uint32]
Generate Uniformly Distributed Random Numbers
The rand and randi functions generate uniformly distributed random numbers on the GPU. These two functions behave very similarly and only the performance of rand is measured here. Use the gputimeit function to measure the execution time to ensure accurate timing results, as gputimeit calls the function many times to account for initialization overhead and correctly deals with synchronization.
To compare the performance of the different generators, use rand to generate a large number of random numbers on the GPU using each generator. Generate random numbers 100 times for each generator and time each function call using gputimeit.
reset(gpu) generators = ["Philox" "Threefry" "CombRecursive"]; gputimesU = nan(100,3); for g=1:numel(generators) % Set the generator gpurng(0,generators{g}); % Perform calculation 100 times, timing the generator for rep=1:100 gputimesU(rep,g) = gputimeit(@() rand(10000,1000,"gpuArray")); end end
Plot the results.
figure hold on histogram(gputimesU(:,1),BinWidth=1e-5) histogram(gputimesU(:,2),BinWidth=1e-5) histogram(gputimesU(:,3),BinWidth=1e-5) legend(generators) xlabel("Time to generate 10^7 random numbers (sec)") ylabel("Frequency") title("Generating Samples in U(0,1) Using " + gpu.Name) hold off

The newer generators Threefry and Philox have similar performance. Both are faster than CombRecursive.
Generate Normally Distributed Random Numbers
Many simulations rely on perturbations sampled from a normal distribution. Similar to the test above, use randn to compare the performance of the three generators when generating normally distributed random numbers.
reset(gpu) generators = ["Philox" "Threefry" "CombRecursive"]; gputimesN = nan(100,3); for g=1:numel(generators) % Set the generator gpurng(0,generators{g}); % Perform calculation 100 times, timing the generator for rep=1:100 gputimesN(rep,g) = gputimeit(@() randn(10000,1000,"gpuArray")); end end
Plot the results.
figure hold on histogram(gputimesN(:,1),BinWidth=5e-5) histogram(gputimesN(:,2),BinWidth=5e-5) histogram(gputimesN(:,3),BinWidth=5e-5) legend(generators) xlabel("Time to generate 10^7 random numbers (sec)") ylabel("Frequency") title("Generating Samples in N(0,1) Using " + gpu.Name) hold off

The results again indicate that the Threefry and Philox generators perform similarly and are both notably faster than CombRecursive. The extra computation required to produce normally distributed values reduces the rate at which values are produced by each of the generators.
Restore the original generator state.
gpurng(oldState);
Conclusion
Though the exact results will vary depending on your GPU, each generator has different characteristics.
Threefry
Fast
Longest period ( in double precision)
Based on well-known and well-tested Threefish algorithm
Philox
Fast
Intermediate period ( in double precision)
CombRecursive
Slowest
Shortest period ( in double precision)
Long track record in real-world usage
References
[1] L'Ecuyer, P., and R. Simard. "TestU01: A C library for empirical testing of random number generators." ACM Transactions on Mathematical Software. Vol. 33, No. 4, 2007, article 22.
See Also
gpurng | parallel.gpu.RandStream