Accelerated Polyphase Synthesis Filter Bank

This example shows how to use dspunfold to accelerate the simulation of a polyphase synthesis FFT filter bank by generating a multi-threaded mex file. This example requires MATLAB Coder.

Polyphase Synthesis Filter Bank

dspdemo.PolyphaseFFTMuxer implements a polyphase synthesis filter bank. The synthesis bank or muxer takes multiple narrowband channels and forms a single wideband channel with the narrowband channels side by side.

The implementation consists of a polyphase FIR filter with a certain number of coefficients for each polyphase phase (PolyphaseLength). This filter has a state length equal to the PolyphaseLength. The filter is preceeded by an inverse FFT that is used to modulate the filter to the various frequency sub-bands.

MATLAB Based Implementation

In this example a filter with a PolyphaseLength of 48 is used. The file HelperPolyphaseSynthesisFilterBank runs the filtering. As an example, 64 channels are used and the frame size for each channel is set to 256.

FrameLength = 256;
Nchan       = 64;

The filtering is implemented by writing MATLAB code. Since it is necessary to loop over the input frames, the implementation tends to be slow compared to generated code. The ifft is implemented with compiled code and therefore the acceleration does not affect this part of the computation significantly. As a baseline, we measure how long it takes to process 250 frames

Nframes = 250;
for n = 1:Nframes
    x = complex(randn(FrameLength,Nchan),randn(FrameLength,Nchan));
    y = HelperPolyphaseSynthesisFilterBank(x);
Elapsed time is 22.839157 seconds.

Generating the Accelerated Single-Threaded and Multi-Threaded MEX File

dspunfold can be used to generate both a single-threaded and a multi-threaded MEX file. The input signal is a frame, the state length is given by PolyphaseLength and we set the repetition count to 10. We force the number of threads used to be 2 so that improvements can be seen on machines with at least two cores. This value can be increased on machines with more cores at the expense of higher latency.

dspunfold HelperPolyphaseSynthesisFilterBank -args {x} -f true -s 48 -r 10 -t 2
State length: 48 samples, Repetition: 10, Output latency: 40 frames, Threads: 2
Analyzing: HelperPolyphaseSynthesisFilterBank.m
Creating single-threaded MEX file: HelperPolyphaseSynthesisFilterBank_st.mexmaci64
Creating multi-threaded MEX file: HelperPolyphaseSynthesisFilterBank_mt.mexmaci64
Creating analyzer file: HelperPolyphaseSynthesisFilterBank_analyzer.p

Benchmarking the Single-Threaded MEX File

We first benchmark the generated single-threaded file. This MEX file is equivalent to the MEX that would be generated using the codegen function.

for n = 1:Nframes
    x   = complex(randn(FrameLength,Nchan),randn(FrameLength,Nchan));
    yst = HelperPolyphaseSynthesisFilterBank_st(x);
Elapsed time is 1.255142 seconds.

Benchmarking the Multi-Threaded MEX File

Next we benchmark the generated multi-threaded file. Because the acceleration impacts mostly the filtering and not the inverse FFT, the improvement is not directly proportional to the number of cores the desktop may have. Nevertheless, an improvement of at least 60% can be seen in most cases. Note that in order to make a fair comparison, we add 40 frames in the multi-threaded case due to the latency introduced by unfolding.

for n = 1:Nframes+40
    x   = complex(randn(FrameLength,Nchan),randn(FrameLength,Nchan));
    ymt = HelperPolyphaseSynthesisFilterBank_mt(x);
Elapsed time is 0.709653 seconds.