Computes fast fourier transform (FFT) and generates optimized HDL code
DSP System Toolbox HDL Support / Transforms
The FFT HDL Optimized block provides two architectures that implement the algorithm for FPGA and ASIC applications. You can select an architecture that optimizes for either throughput or area.
Streaming Radix 2^2
— Use this architecture for
highthroughput applications. This architecture supports scalar or vector input data. You
can achieve giga sample per second (GSPS) throughput using vector input.
Burst Radix 2
— Use this architecture for a minimum
resource implementation, especially with large fast fourier transform (FFT) sizes. Your
system must be able to tolerate bursty data and higher latency. This architecture supports
only scalar input data.
The FFT HDL Optimized block replaces the HDL Streaming FFT block and the HDL Minimum Resource FFT block. The FFT HDL Optimized block accepts real or complex data, provides hardwarefriendly control signals, and optional output frame control signals.
data
— Input dataInput data, specified as a scalar or column vector of real or complex values. Only
the Streaming Radix 2^2
architecture supports a vector
input. The vector size must be a power of 2, in the range from 1 to 64, and less than
or equal to FFT length.
double
and single
data
types are supported for simulation, but not for HDL code generation.
Data Types: int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 fixed point
 single
 double
Complex Number Support: Yes
valid
— Indicates valid input dataThis port indicates if the input data is valid. When the input
valid is 1
(true), the block captures the
value on the input data port. When the input
valid is 0
(false), the block ignores the
input data samples.
Data Types: Boolean
reset
— Reset control signalWhen reset is 1
(true), the block stops
the current calculation and clears all internal states. The block starts a new frame
when the reset is 0
(false) and the input
valid is 1
(true).
To enable this port, select the Enable reset input port parameter.
Data Types: Boolean
data
— Frequency channel output dataWhen input is fixedpoint data type and scaling is enabled, the output data type
is the same as the input data type. When the input is integer type and scaling is
enabled, the output is fixedpoint type with the same word length as the input
integer. The output order is bitreversed by default. If scaling is disabled, the
output word length increases to avoid overflow. Only the Streaming Radix
2^2
architecture supports vector input and output. For more
information, see the Divide butterfly outputs by two
parameter.
Data Types: fixed point
 double
 single
Complex Number Support: Yes
valid
— Indicates valid output data This port indicates that output data is valid. When
valid is 1
(true), the block returns valid
data on the output data port. When valid is
0
(false), the values on output data port
are not valid.
Data Types: Boolean
ready
— Indicates block is readyThis port indicates that the block is ready for a new input sample. When
ready is 1
(true), the block accepts input
data in the next time step, and when ready is
0
(false), the block ignores the input data in the next time
step.
To enable this port, set the Architecture parameter to
Burst Radix 2
.
Data Types: Boolean
start
— Indicates first valid cycle of output dataWhen you enable this port, the block sets the start output to
1
(true) during the first valid cycle of a frame of output
data.
To enable this port, select the Enable start output port parameter.
Data Types: Boolean
end
— Indicates last valid cycle of output dataWhen you enable this port, the block sets the end output to
1
(true) during the last valid cycle of a frame of output
data.
To enable this port, select the Enable end output port parameter.
Data Types: Boolean
FFT length
— Number of data points for one FFT calculation1024
(default)This parameter specifies the number of data points used for one FFT calculation. For HDL code generation, the FFT length must be a power of 2 between 2^{3} to 2^{16}.
Architecture
— Architecture typeStreaming Radix 2^2
(default)  Burst Radix 2
This parameter specifies the type of architecture.
Streaming Radix 2^2
— Select this value to
specify lowlatency architecture. This architecture type supports GSPS throughput
when using vector input.
Burst Radix 2
— Select this value to specify
minimum resource architecture. This architecture type does not support vector
input.
For more details about these architectures, see Algorithms.
Complex multiplication
— HDL implementationUse 4 multipliers and 2 adders
(default)  Use 3 multipliers and 5 adders
This parameter specifies the complex multiplier type for HDL implementation. Each
multiplication is implemented either with Use 4 multipliers and 2
adders
or with Use 3 multipliers and 5
adders
. The implementation speed depends on the synthesis tool and
target device that you use.
Output in bitreversed order
— Order of output dataon
(default)  off
This parameter returns output elements in bitreversed order.
When you select this parameter, the output elements are bitreversed. To return output elements in linear order, clear this parameter.
The FFT algorithm calculates output in the reverse order to the input. If you specify the output to be in the same order as the input, the algorithm performs an extra reversal operation. For more information, see Linear and BitReversed Output Order.
Input in bitreversed order
— Expected order of input dataoff
(default)  on
When you select this parameter, the block expects input data in bitreversed order. By default, this parameter is disabled, and the block expects the input in linear order.
The FFT algorithm calculates output in the reverse order to the input. If you specify the output to be in the same order as the input, the algorithm performs an extra reversal operation. For more information, see Linear and BitReversed Output Order.
Divide butterfly outputs by two
— FFT scalingoff
(default)  on
When you select this parameter, the FFT implements an overall 1/N scale factor by dividing the output of each butterfly multiplication by two. This adjustment keeps the output of the FFT in the same amplitude range as its input. If you disable scaling, the FFT avoids overflow by increasing the word length by 1 bit after each butterfly multiplication. The bit increase is the same for both architectures.
Rounding mode
— Rounding mode for internal fixedpoint calculationsFloor
(default)  Ceiling
 Convergent
 Nearest
 Round
 Zero
This parameter specifies the type of rounding mode for internal fixedpoint
calculations. For more information about rounding modes, see Rounding Modes. When the input is any integer or fixedpoint data type,
this block uses fixedpoint arithmetic for internal calculations. This parameter does
not apply when the input data is single
or
double
. Rounding applies to twiddlefactor multiplication and
scaling operations.
Enable reset input port
— Optional reset signaloff
(default)  on
This parameter enables a reset input port. When you select this parameter, the input reset port appears on the block icon.
Enable start output port
— Optional control signal indicating start of dataoff
(default)  on
This parameter enables a port that indicates the start of output data. When you select this parameter, the output start port appears on the block icon.
Enable end output port
— Optional control signal indicating end of dataoff
(default)  on
This parameter enables a port that indicates the end of output data. When you select this parameter, the output end port appears on the block icon.
The streaming Radix 2^2 architecture implements a lowlatency architecture. It saves resources compared to a streaming Radix 2 implementation by factoring and grouping the FFT equation. The architecture has log_{4}(N) stages. Each stage contains two singlepath delay feedback (SDF) butterflies with memory controllers. When you use vector input, each stage operates on fewer input samples, so some stages reduce to a simple butterfly, without SDF.
The first SDF stage is a regular butterfly. The second stage multiplies the outputs of the first stage by –j. To avoid a hardware multiplier, the block swaps the real and imaginary parts of the inputs, and again swaps the imaginary parts of the resulting outputs. Each stage rounds the result of the twiddle factor multiplication to the input word length. The twiddle factors have two integer bits, and the rest of the bits are used for fractional bits. The twiddle factors have the same bit width as the input data, WL. The twiddle factors have two integer bits, and WL2 fractional bits.
If you enable scaling, the algorithm divides the result of each butterfly stage by 2. Scaling at each stage avoids overflow, keeps the word length the same as the input, and results in an overall scale factor of 1/N. If scaling is disabled, the algorithm avoids overflow by increasing the word length by 1 bit at each stage. The diagram shows the butterflies and internal word lengths of each stage, not including the memory.
The burst Radix 2 architecture implements the FFT by using a single complex butterfly multiplier. The algorithm cannot start until it has stored the entire input frame, and it cannot accept the next frame until computations are complete. The output ready port indicates when the algorithm is ready for new data. The diagram shows the burst architecture, with pipeline registers.
The algorithm processes input data only when the input valid port is 1. Output data is valid only when the output valid port is 1.
When the optional input reset port is 1, the algorithm stops the current calculation and clears all internal states. The algorithm begins new calculations when reset port is 0 and the input valid port starts a new frame.
This diagram shows the input and output valid port values for contiguous scalar input data, streaming Radix 2^2 architecture, an FFT length of 1024, and a vector size of 16.
The diagram also shows the optional start and end port values that indicate frame boundaries. If you enable the start port, the start port value pulses for one cycle with the first valid output of the frame. If you enable the end port, the start port value pulses for one cycle with the last valid output of the frame.
If you apply continuous input frames, the output will also be continuous after the initial latency.
The input valid port can be noncontiguous. Data accompanied by an input valid port is processed as it arrives, and the resulting data is stored until a frame is filled. Then the algorithm returns contiguous output samples in a frame of N (FFT length) cycles. This diagram shows noncontiguous input and contiguous output for an FFT length of 512 and a vector size of 16.
When you use the burst architecture, you cannot provide the next frame of input data until memory space is available. The ready port indicates when the algorithm can accept new input data.
The latency varies with the FFT length and input vector size. After you update the model, the block icon displays the latency. The displayed latency is the number of cycles between the first valid input and the first valid output, assuming the input is contiguous. To obtain this latency programmatically, see Automatic Delay Matching for the Latency of FFT HDL Optimized Block.
When using the burst architecture with a contiguous input, if your design waits for
ready to output 0
before deasserting the input
valid, then one extra cycle of data arrives at the input. This data
sample is the first sample of the next frame. The algorithm can save one sample while
processing the current frame. Due to this one sample advance, the observed latency of the
later frames (from input valid to output valid) is
one cycle shorter than the reported latency. The latency is measured from the first cycle,
when input valid is 1 to the first cycle when output
valid is 1. The number of cycles between when
ready port is 0 and the output valid port is 1
is always latency – FFTLength.
This resource and performance data is the synthesis result from the generated HDL targeted to a Xilinx^{®} Virtex^{®}6 (XC6VLX75T1FF484) FPGA. The examples in the tables have this configuration:
1024 FFT length (default)
Complex multiplication using 4 multipliers, 2 adders
Output scaling enabled
Natural order input, Bitreversed output
16bit complex input data
Clock enables minimized (HDL Coder™ parameter)
Performance of the synthesized HDL code varies with your target and synthesis options. For instance, reordering for a naturalorder output uses more RAM than the default bitreversed output, and real input uses less RAM than complex input.
For a scalar input Radix 2^2 configuration, the design achieves 326 MHz clock frequency. The latency is 1116 cycles. The design uses these resources.
Resource  Number Used 

LUT  4597 
FFS  5353 
Xilinx LogiCORE^{®} DSP48  12 
Block RAM (16K)  6 
When you vectorize the same Radix 2^2 implementation to process two 16bit input samples in parallel, the design achieves 316 MHz clock frequency. The latency is 600 cycles. The design uses these resources.
Resource  Number Used 

LUT  7653 
FFS  9322 
Xilinx LogiCORE DSP48  24 
Block RAM (16K)  8 
The block supports scalar input data only when implementing burst Radix 2 architecture. The burst design achieves 309 MHz clock frequency. The latency is 5811 cycles. The design uses these resources.
Resource  Number Used 

LUT  971 
FFS  1254 
Xilinx LogiCORE DSP48  3 
Block RAM (16K)  6 
[1] Algnabi, Y.S, F.A. Aldaamee, R. Teymourzadeh, M. Othman, and M.S. Islam. “Novel architecture of pipeline Radix 2^2 SDF FFT Based on digitslicing technique.” 10th IEEE International Conference on Semiconductor Electronics (ICSE). 2012, pp. 470–474.
This block supports C/C++ code generation for Simulink^{®} accelerator and rapid accelerator modes and for DPI component generation.
This block supports HDL code generation using HDL Coder. HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic.
This block has a single, default HDL architecture.
ConstrainedOutputPipeline  Number of registers to place at
the outputs by moving existing delays within your design. Distributed
pipelining does not redistribute these registers. The default is

InputPipeline  Number of input pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

OutputPipeline  Number of output pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

If you use the FFT HDL Optimized block with the State Control (HDL Coder) block inside an Enabled Subsystem (Simulink), the optional reset port is not supported. If you enable the reset port on the FFT HDL Optimized block in such a subsystem, the model errors on Update Diagram.
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
Select web siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.