Fast Fourier transform—optimized for HDL code generation

Transforms

`dspxfrm3`

The FFT HDL Optimized block implements a pipelined Radix-2 FFT algorithm which provides hardware speed and area optimization for streaming data applications. The block accepts scalar real or complex input, provides hardware-friendly control signals, and has optional output frame control signals. Vector input is supported for simulation but not for HDL code generation.

This block provides an option to synthesize the lookup table
to a ROM when using HDL Coder™ with an FPGA target. To enable this
feature, right-click the block, select **HDL Code >
HDL Block Properties** and set **LUTRegisterResetType** to `none`

.

The FFT HDL Optimized block replaces the demo block HDL Streaming FFT.

The following image illustrates the port signals of the interface for the FFT HDL Optimized block.

The following table provides the descriptions of the port signals.

Port | Direction | Description | Data Type |
---|---|---|---|

`dataIn` | Input | Scalar or column vector (FFT Length x 1) real or complex input
data. If `dataIn` is a vector, all other ports must
be vectors of matching size. Scalar input is required for HDL code
generation. | `fixdt()` `int64/32/16/8` `uint64/32/16/8`
`double/single` are allowed for simulation
but not for HDL code generation. |

`validIn` | Input | Indicates that the input data is valid. When `validIn` is
high, the block captures the value on `dataIn` . This
port is optional for simulation, but required for HDL code generation. | `boolean` |

`resetIn` | Input | Optional. Reset internal state. When `resetIn` is
high, the block stops current calculation and clears all internal
state. The block begins fresh calculations when `resetIn` is
low and `validIn` starts a new frame. | `boolean` |

`dataOut` | Output | Frequency channel output data. The data width and format is the same as the input data port. The output order is reversed by default. | Same as `dataIn` |

`validOut` | Output | Indicates that the output data is valid. The block sets `validOut` high
when `dataOut` is ready. | `boolean` |

`startOut` | Output | Optional. When enabled, the block sets `startOut` high
during the first valid cycle of a frame of output data. | `boolean` |

`endOut` | Output | Optional. When enabled, the block sets `endOut` high
during the last valid cycle of a frame of output data. | `boolean` |

**Output in bit-reversed order**When selected, the output elements are bit-reversed relative to the input order. When cleared, the output elements are in linear order. The default value is selected. The FFT algorithm calculates output in bit-reversed order and an extra reversal operation is done when providing linear output. For more information, see Linear and Bit-Reversed Output Order.

**Enable dividing butterfly outputs by 2**When selected, the block implements an overall 1/N scale factor by scaling the result of each pipeline stage by 2. This adjustment keeps the output of the FFT in the same amplitude range as its input. Scaling at each stage avoids overflow The default value is not selected.

**Source of FFT length**Select the source of the FFT length. When you select

`Property`

, the FFT length is set by the`FFT length`

field in the mask. If you use`Property`

with vector input, the input vector width must be less than or equal to the FFT length. When you select`Auto`

, the FFT length is inferred from the input vector data width. The`Auto`

FFT length option is not supported for scalar input. The default is`Property`

.**FFT length**Specify the number of data points used for one FFT calculation. This value is used when

**Source of FFT length**is set to`Property`

. The default value is 1024. The FFT length must be a power of 2 between 2^{3}and 2^{16}for HDL code generation. If the input is a vector, the width must be less than or equal to the FFT Length.**Simulate using**Type of simulation to run. This parameter does not affect generated HDL code.

`Code generation`

(default)Simulate model using generated C code. The first time you run a simulation, Simulink

^{®}generates C code for the block. The C code is reused for subsequent simulations, as long as the model does not change. This option requires additional startup time but provides faster simulation speed than`Interpreted execution`

.`Interpreted execution`

Simulate model using the MATLAB

^{®}interpreter. This option shortens startup time but has slower simulation speed than`Code generation`

.

These options specify how numerical type limitations are handled in fixed-point calculations. The FFT block uses fixed-point arithmetic for internal calculations when the input is any integer or fixed-point data type. These options do not apply when the input is single or double type.

**Rounding Method**The default rounding method for internal fixed point calculations is

`Floor`

.**Overflow Action**The default overflow action for internal fixed point calculations is

`Wrap`

.

**Enable valid input port**When selected, the

`validIn`

port is present on the block icon and input data is qualified by the`validIn`

signal. The default value is selected.**Enable reset input port**When selected, the

`resetIn`

port is present on the block icon. When`resetIn`

is high, the block stops the current calculation and clears all internal state. The block begins fresh calculations when`resetIn`

is low and`validIn`

starts a new frame. The default value is not selected.**Enable start output port**When selected, the

`startOut`

port is present on the block icon, and this output signal is asserted for the first cycle of an output frame. The default value is not selected.**Enable end output port**When selected, the

`endOut`

port is present on the block icon, and this output signal is asserted for the last cycle of an output frame. The default value is not selected.

The FFT HDL Optimized block implements a pipelined
Radix–2 algorithm with decimation in time. This architecture
is efficient for streaming input data. There are log_{2}(N)
pipeline stages. Each pipeline stage, or kernel, contains memory,
a controller, and a complex Radix-2 butterfly.

Using decimation in time, each kernel multiplies by the twiddle
factor and then adds samples in a butterfly. The kernel architecture
minimizes the number of multipliers and adders. Within each kernel,
data is processed in full precision. The kernel rounds to the output
data width after the butterfly sum. If you select **Enable
dividing butterfly outputs by 2**, the block scales the result
of each pipeline stage by 2. Scaling at each stage avoids overflow,
keeps the word length the same as the input, and results in an overall
scale factor of 1/N. If scaling is disabled, the block avoids overflow
by increasing the word length by 1 bit at each stage.

The twiddle factors have the same bit width as the input data. They use 2 integer bits and the remainder are fractional bits. Each butterfly multiplier is therefore WLxWL.

The `validIn`

control signal is required for
HDL code generation. It is optional for simulation. If you enable
the `validIn`

port, input data is processed only
when `validIn`

is high. Output data is valid when `validOut`

is
high.

The block provides an optional reset port. When `resetIn`

is
high, the block stops the current calculation and clears all internal
state. The block begins fresh calculations when `resetIn`

is
low and `validIn`

starts a new frame.

This diagram illustrates `validIn`

and `validOut`

signals
for an FFT length of 1024.

The `validIn`

signal can be noncontiguous.
Data accompanied by a `validIn`

is stored until a
frame is filled, and output in a contiguous frame of N (FFT length)
cycles. This diagram illustrates noncontiguous input and contiguous
output for an FFT length of 1024.

There are optional `startOut`

and `endOut`

signals
to indicate frame boundaries. If you enable `startOut`

,
it pulses for one cycle with the first `validOut`

of
the frame. If you enable `endOut`

, it pulses for one
cycle with the last `validOut`

of the frame. This
diagram illustrates the output framing signals for an FFT length of
1024.

The latency varies with the FFT length. If you set **Source
of FFT length** to `Property`

, the
latency is displayed on the block icon. The displayed latency is the
number of cycles between the first valid input and the first valid
output, assuming the input is contiguous. The icon latency is updated
when you change **FFT length**. If you set **Source
of FFT length** to `Auto`

, the latency
is not displayed because the FFT length is not known until you compile
the model.

This block supports HDL code generation using HDL Coder. HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic. For more information on implementations, properties, and restrictions for HDL code generation, see FFT HDL Optimized in the HDL Coder documentation.

When generated HDL code for the default configuration (FFT length
1024) with 16-bit input is synthesized into a Xilinx^{®} Virtex^{®}–6
(XC6VLX75T-1FF484) FPGA, the design achieves 295 MHz clock frequency.
The latency is 1148 cycles. It uses the following resources.

Resource | Uses |
---|---|

LUT | 4060 |

FFS | 5160 |

Xilinx LogiCORE | 16 |

Block RAM (16K) | 6 |

Performance of the synthesized HDL code varies with your target and synthesis options. For instance, natural order output uses more RAM than bit-reversed output, and real input uses less RAM than complex input.

Was this topic helpful?