Documentation

FFT HDL Optimized

Fast Fourier transform—optimized for HDL code generation

Library

Transforms

dspxfrm3

Description

The FFT HDL Optimized block implements a pipelined Radix-2 FFT algorithm which provides hardware speed and area optimization for streaming data applications. The block accepts scalar real or complex input, provides hardware-friendly control signals, and has optional output frame control signals. Vector input is supported for simulation but not for HDL code generation.

This block provides an option to synthesize the lookup table to a ROM when using HDL Coder™ with an FPGA target. To enable this feature, right-click the block, select HDL Code > HDL Block Properties and set LUTRegisterResetType to none.

The FFT HDL Optimized block replaces the demo block HDL Streaming FFT.

Signal Attributes

The following image illustrates the port signals of the interface for the FFT HDL Optimized block.

The following table provides the descriptions of the port signals.

PortDirectionDescriptionData Type
dataInInputScalar or column vector (FFT Length x 1) real or complex input data. If dataIn is a vector, all other ports must be vectors of matching size. Scalar input is required for HDL code generation.
  • fixdt()

  • int64/32/16/8

  • uint64/32/16/8

double/single are allowed for simulation but not for HDL code generation.
validInInputIndicates that the input data is valid. When validIn is high, the block captures the value on dataIn. This port is optional for simulation, but required for HDL code generation. boolean
resetInInputOptional. Reset internal state. When resetIn is high, the block stops current calculation and clears all internal state. The block begins fresh calculations when resetIn is low and validIn starts a new frame.boolean
dataOutOutputFrequency channel output data. The data width and format is the same as the input data port. The output order is reversed by default. Same as dataIn
validOutOutputIndicates that the output data is valid. The block sets validOut high when dataOut is ready. boolean
startOutOutputOptional. When enabled, the block sets startOut high during the first valid cycle of a frame of output data.boolean
endOutOutputOptional. When enabled, the block sets endOut high during the last valid cycle of a frame of output data.boolean

Dialog Box and Parameters

Main

Output in bit-reversed order

When selected, the output elements are bit-reversed relative to the input order. When cleared, the output elements are in linear order. The default value is selected. The FFT algorithm calculates output in bit-reversed order and an extra reversal operation is done when providing linear output. For more information, see Linear and Bit-Reversed Output Order.

Enable dividing butterfly outputs by 2

When selected, the block implements an overall 1/N scale factor by scaling the result of each pipeline stage by 2. This adjustment keeps the output of the FFT in the same amplitude range as its input. Scaling at each stage avoids overflow The default value is not selected.

Source of FFT length

Select the source of the FFT length. When you select Property , the FFT length is set by the FFT length field in the mask. If you use Property with vector input, the input vector width must be less than or equal to the FFT length. When you select Auto, the FFT length is inferred from the input vector data width. The Auto FFT length option is not supported for scalar input. The default is Property.

FFT length

Specify the number of data points used for one FFT calculation. This value is used when Source of FFT length is set to Property. The default value is 1024. The FFT length must be a power of 2 between 23 and 216 for HDL code generation. If the input is a vector, the width must be less than or equal to the FFT Length.

Simulate using

Type of simulation to run. This parameter does not affect generated HDL code.

  • Code generation (default)

    Simulate model using generated C code. The first time you run a simulation, Simulink® generates C code for the block. The C code is reused for subsequent simulations, as long as the model does not change. This option requires additional startup time but provides faster simulation speed than Interpreted execution.

  • Interpreted execution

    Simulate model using the MATLAB® interpreter. This option shortens startup time but has slower simulation speed than Code generation.

Data Types

These options specify how numerical type limitations are handled in fixed-point calculations. The FFT block uses fixed-point arithmetic for internal calculations when the input is any integer or fixed-point data type. These options do not apply when the input is single or double type.

Rounding Method

The default rounding method for internal fixed point calculations is Floor.

Overflow Action

The default overflow action for internal fixed point calculations is Wrap.

Control Ports

Enable valid input port

When selected, the validIn port is present on the block icon and input data is qualified by the validIn signal. The default value is selected.

Enable reset input port

When selected, the resetIn port is present on the block icon. When resetIn is high, the block stops the current calculation and clears all internal state. The block begins fresh calculations when resetIn is low and validIn starts a new frame. The default value is not selected.

Enable start output port

When selected, the startOut port is present on the block icon, and this output signal is asserted for the first cycle of an output frame. The default value is not selected.

Enable end output port

When selected, the endOut port is present on the block icon, and this output signal is asserted for the last cycle of an output frame. The default value is not selected.

Algorithm

The FFT HDL Optimized block implements a pipelined Radix–2 algorithm with decimation in time. This architecture is efficient for streaming input data. There are log2(N) pipeline stages. Each pipeline stage, or kernel, contains memory, a controller, and a complex Radix-2 butterfly.

Using decimation in time, each kernel multiplies by the twiddle factor and then adds samples in a butterfly. The kernel architecture minimizes the number of multipliers and adders. Within each kernel, data is processed in full precision. The kernel rounds to the output data width after the butterfly sum. If you select Enable dividing butterfly outputs by 2, the block scales the result of each pipeline stage by 2. Scaling at each stage avoids overflow, keeps the word length the same as the input, and results in an overall scale factor of 1/N. If scaling is disabled, the block avoids overflow by increasing the word length by 1 bit at each stage.

The twiddle factors have the same bit width as the input data. They use 2 integer bits and the remainder are fractional bits. Each butterfly multiplier is therefore WLxWL.

Control Signals

The validIn control signal is required for HDL code generation. It is optional for simulation. If you enable the validIn port, input data is processed only when validIn is high. Output data is valid when validOut is high.

The block provides an optional reset port. When resetIn is high, the block stops the current calculation and clears all internal state. The block begins fresh calculations when resetIn is low and validIn starts a new frame.

Timing Diagram

This diagram illustrates validIn and validOut signals for an FFT length of 1024.

The validIn signal can be noncontiguous. Data accompanied by a validIn is stored until a frame is filled, and output in a contiguous frame of N (FFT length) cycles. This diagram illustrates noncontiguous input and contiguous output for an FFT length of 1024.

There are optional startOut and endOut signals to indicate frame boundaries. If you enable startOut, it pulses for one cycle with the first validOut of the frame. If you enable endOut, it pulses for one cycle with the last validOut of the frame. This diagram illustrates the output framing signals for an FFT length of 1024.

Latency

The latency varies with the FFT length. If you set Source of FFT length to Property, the latency is displayed on the block icon. The displayed latency is the number of cycles between the first valid input and the first valid output, assuming the input is contiguous. The icon latency is updated when you change FFT length. If you set Source of FFT length to Auto, the latency is not displayed because the FFT length is not known until you compile the model.

HDL Code Generation

This block supports HDL code generation using HDL Coder. HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic. For more information on implementations, properties, and restrictions for HDL code generation, see FFT HDL Optimized in the HDL Coder documentation.

Performance

When generated HDL code for the default configuration (FFT length 1024) with 16-bit input is synthesized into a Xilinx® Virtex®–6 (XC6VLX75T-1FF484) FPGA, the design achieves 295 MHz clock frequency. The latency is 1148 cycles. It uses the following resources.

ResourceUses
LUT4060
FFS5160

Xilinx LogiCORE® DSP48

16
Block RAM (16K)6

Performance of the synthesized HDL code varies with your target and synthesis options. For instance, natural order output uses more RAM than bit-reversed output, and real input uses less RAM than complex input.

Introduced in R2014a

Was this topic helpful?