SqrtLUT

Compute square-root operation using lookup table

Since R2026a

Libraries:
HDL Coder

Description

The SqrtLUT block computes the square root of an input signal by using a look-up table (LUT) and piecewise linear interpolation. Use this block for fixed-point signal processing applications where you need a high-speed square root operation with low latency and minimal resource usage. The SqrtLUT block normalizes input values outside the range [0,1] and then computes the square root by using piecewise linear interpolation based on the values in a lookup table. Finally, the block rescales the interpolated result and outputs it. For more information, see Compute Square Root Using Look-up Table Approach.

The block uses the validIn input port and validOut output port to indicate when the input data is valid and the output data is ready. Use these ports to simulate the block accurately with latency. The algorithm operates on a fixed-point format internally and maintains a fixed latency regardless of input data type. For more information, see Data Type Considerations.

To use this block in your Simulink^® model, open the HDLMathLib library by entering this command in the MATLAB^® Command Window:

open_system("HDLMathLib")

Examples

expand all

Compute Square Root of Fixed-Point Data

Open Live Script

This example shows how to compute the square root of fixed-point value and generate HDL code.

Create Data and Open Model

Create a variable that contains the data to compute. For this example, create a variable that contains a linear sweep. You can change these values according to your requirements.

SQRT_input = fi(1/2^15:1/2^15:1,0,16,15)';

Specify the word length for fixed-point data types and the latency for the model. The block maintains a fixed latency regardless of the input data type.

WL = 16;  
latency = 7;

Open the hdlcoderSqrtLUT model and specify a stop time sufficient to process all the input combinations.

stoptime = length(SQRT_input)-1+latency;
open_system("hdlcoderSqrtLUT")
sim("hdlcoderSqrtLUT")

This figure shows the simulation waveform for the model. You can see that dataOut output is valid when validOut is 1, which is after 7 cycles.

Validate Simulink Output With Reference Value

To validate the output of the Simulink model, compare the output of the simulation to a reference value. Compute the reference output by using the sqrt function.

ref_SQRT = sqrt(double(SQRT_input));

Use logical indexing to extract the valid output.

implementation_SQRT = simulink_SQRT(valid_output);

Plot the comparison results by using the comparison_plot_sqrt function. The maximum error value is significantly smaller than the output of the model.

comparison_plot_sqrtLUT(ref_SQRT,implementation_SQRT,1,"SQRT linear input");

Maximum Error SQRT linear input 7.906744e-06 
Maximum PctError SQRT linear input 1.358995e-03

Generate HDL Code for Square Root Implementation

Check the HDL settings of the model by using the hdlsaveparams function.

hdlsaveparams("hdlcoderSqrtLUT")

%% Set Model 'hdlcoderSqrtLUT' HDL parameters
hdlset_param('hdlcoderSqrtLUT', 'Backannotation', 'on');
hdlset_param('hdlcoderSqrtLUT', 'HDLSubsystem', 'hdlcoderSqrtLUT/SqrtLUT');
hdlset_param('hdlcoderSqrtLUT', 'ProjectFolder', 'hdl_prj');
hdlset_param('hdlcoderSqrtLUT', 'ResetType', 'Synchronous');
hdlset_param('hdlcoderSqrtLUT', 'SynthesisTool', 'Xilinx Vivado');
hdlset_param('hdlcoderSqrtLUT', 'SynthesisToolChipFamily', 'Zynq UltraScale+');
hdlset_param('hdlcoderSqrtLUT', 'SynthesisToolDeviceName', 'xazu11eg-ffvf1517-1-i');
hdlset_param('hdlcoderSqrtLUT', 'TargetDirectory', 'hdl_prj\hdlsrc');
hdlset_param('hdlcoderSqrtLUT', 'TargetFrequency', 500);
hdlset_param('hdlcoderSqrtLUT', 'Traceability', 'on');

% Set Delay HDL parameters
hdlset_param('hdlcoderSqrtLUT/SqrtLUT/Linear_Interpolation/Delay12', 'ResetType', 'none');

% Set Delay HDL parameters
hdlset_param('hdlcoderSqrtLUT/SqrtLUT/Linear_Interpolation/Delay13', 'ResetType', 'none');

Generate HDL code for the model by using the makehdl function.

makehdl("hdlcoderSqrtLUT/SqrtLUT")

### Working on the model hdlcoderSqrtLUT
### Generating HDL for hdlcoderSqrtLUT/SqrtLUT
### Using the config set for model hdlcoderSqrtLUT for HDL code generation parameters.
### Running HDL checks on the model 'hdlcoderSqrtLUT'.
### Begin compilation of the model 'hdlcoderSqrtLUT'...
### Begin compilation of the model 'hdlcoderSqrtLUT'...
### Working on the model 'hdlcoderSqrtLUT'...
### Working on... GenerateModel
### Begin model generation 'gm_hdlcoderSqrtLUT'...
### Copying DUT to the generated model....
### Model generation complete.
### Generated model saved at hdl_prj/hdlsrc/hdlcoderSqrtLUT/gm_hdlcoderSqrtLUT.slx
### To highlight lookup tables mapped to RAM, click the following MATLAB script: hdl_prj/hdlsrc/hdlcoderSqrtLUT/highlightLUTPipeliningDiagnostic.m
### To clear highlighting, click the following MATLAB script: hdl_prj/hdlsrc/hdlcoderSqrtLUT/clearhighlighting.m
### Begin VHDL Code Generation for 'hdlcoderSqrtLUT'.
### Working on... Traceability
### Working on hdlcoderSqrtLUT/SqrtLUT/Linear_Interpolation as hdl_prj/hdlsrc/hdlcoderSqrtLUT/Linear_Interpolation.vhd.
### Working on hdlcoderSqrtLUT/SqrtLUT/Normalizer as hdl_prj/hdlsrc/hdlcoderSqrtLUT/Normalizer.vhd.
### Working on hdlcoderSqrtLUT/SqrtLUT/Variable_Right_Shift as hdl_prj/hdlsrc/hdlcoderSqrtLUT/Variable_Right_Shift.vhd.
### Working on hdlcoderSqrtLUT/SqrtLUT as hdl_prj/hdlsrc/hdlcoderSqrtLUT/SqrtLUT.vhd.
### Generating package file hdl_prj/hdlsrc/hdlcoderSqrtLUT/SqrtLUT_pkg.vhd.
### Code Generation for 'hdlcoderSqrtLUT' completed.
### Generating HTML files for code generation report at index.html
### Creating HDL Code Generation Check Report SqrtLUT_report.html
### HDL check for 'hdlcoderSqrtLUT' complete with 0 errors, 0 warnings, and 1 messages.
### HDL code generation complete.

close_system("hdlcoderSqrtLUT")
close all;

Resource Summary of SqrtLUT Block

Use the HDL Workflow Advisor to perform FPGA synthesis of your model. In the Workflow Advisor, select the target device and run the synthesis by navigating to FPGA Synthesis and Analysis > Perform Synthesis and P/R > Run Implementation task. Right-click the task and choose Run to Selected Task. The figure illustrates the synthesis performance of the SqrtLUT block on a Xilinx Zynq UltraScale+ device.

Limitations

You cannot use these data types as input values for the block:

Vectors, matrices, or buses
Complex numbers
Signed integers

Additionally, you cannot use input values that have more than 32 bits.

Ports

Input

expand all

dataIn — Input data signal
scalar

Value from which to calculate the square root, specified as a scalar.

Data Types: uint8 | uint16 | uint32 | fixed point

validIn — Whether input control signal is valid
scalar

Whether the input signal is valid, specified as a scalar.

Data Types: Boolean

Output

expand all

dataOut — Output data signal
scalar

Square root of the input signal, returned as a scalar.

validOut — Whether output control signal is valid
scalar

Whether output signal is valid, returned as a scalar.

Data Types: Boolean

Parameters

expand all

Output data type — Output data type
`Inherit: Inherit via internal rule` (default) | `uint8` | `uint16` | `uint32` | `fixdt(1,16,0)` | `<data type expression>`

Specify the output data type. Set this parameter to Inherit: Inherit via internal rule to inherit the data type or set it to another value to specify the data type.

Programmatic Use

Block Parameter: OutDataTypeStr

Type: character vector

Values:

'Inherit: Inherit via
                  internal rule'

'<data type
                  expression>'

Default:

'Inherit: Inherit via
                  internal rule'

More About

expand all

Resource and Performance Comparison of Sqrt and SqrtLUT Blocks

The Sqrt block in the HDLMathLib library computes square roots by using an iterative algorithm for fixed-point arithmetic, which often results in higher latency and larger hardware area. In contrast, the SqrtLUT block reduces both latency and resource usage by applying LUT-based interpolation and a single multiplier.

This table compares the performance of the Sqrt block and the SqrtLUT block.

Design Attribute	Sqrt Block	SqrtLUT Block
Computation Method	Iterative algorithm	LUT-based with piecewise linear interpolation
Accuracy	High precision	Low precision
Latency	The latency is higher compared to the SqrtLUT block. The latency depends on the word length of fixed-point data types.	The block provides a fixed latency of seven cycles, independent of the word length of the fixed-point inputs.
Resource Usage	More hardware usage	Lesser hardware usage
Input Types	Fixed-point	Fixed-point values less than or equal to 32 bits in length
Best For	Precision-critical designs	High-speed, low-latency designs

The table below summarizes FPGA synthesis results for these two blocks The comparison highlights trade-offs in resource utilization, approximation error, and performance across different hardware targets. The synthesis results are computed based on the following block configuration:

Input type: ufix_32_18

Output type: ufix_64_35

Synthesis Results — Tool: Xilinx Vivado, Device: Zynq UltraScale+

Metric	SqrtLUT Block Results	Sqrt Block Results
Fmax (MHz)	350	200
LUTs (logic)	296	4038
DSPs	1	0
Approximate Error	1.25E‑3	1.18E‑8

Synthesis Results — Tool: Intel Quartus Pro, Device: Stratix 10

Metric	SqrtLUT Block Results	Sqrt Block Results
Fmax (MHz)	255	273
ALUTs (logic)	204	4287
DSPs	1	0
Approximate Error	1.25E‑3	1.18E‑8

Algorithms

expand all

Square Root Computation by Using Lookup Tables

The SqrtLUT block computes the square root by normalizing the values, interpolating the value linearly, and then rescaling it.

Normalization

First, the block normalizes the input by left-shifting it until the most significant bit (MSB) is set. To determine the shift amount:

The block checks for bits in groups of 16, 8, 4, or 2, and shifts the value 16, 8, 4, or 2 values to the left, respectively. For example, If the upper 16 bits are all zeros, the block shifts the values left by 16 bits.
Records the total shift amount.

This step ensures the input falls within range of up to 31 bits.

Linear Interpolation

The block computes the interpolated value by using the equation:

Value = Initial + Delta × (Final − Initial).

The piecewise linear interpolation:

Uses the high 8 bits of the normalized input to look up an initial value in a LUT.
Retrieves the difference between consecutive LUT entries from another LUT.
Scales this difference by using the lower bits of the input and adds it to the initial value.

Rescaling

The block compensates for the initial left shift by right-shifting the output by half the shift amount by using this equation:

$\sqrt{x × 2^{N}} = \sqrt{x} × 2^{(N / 2)}$

This approach avoids fractional factors such as 2^1/2.

Data Type and Latency Considerations

The algorithm operates on a fixed-point format with a 32-bit word length and 18 fractional bits. If the input uses a different data type, the block converts it to this format by using a Data Type Conversion (DTC) block before processing. After computing the square root, the block converts the result back to the required data type using another DTC block. For best results, use input and output types that have word lengths less than 32 bits.

The block maintains a fixed latency regardless of the input value, which simplifies timing analysis in HDL designs. The block has a fixed latency of 7.

Extended Capabilities

expand all

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

The block supports HDL code generation using HDL Coder™. HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic.

HDL Architecture

Architecture Description

Module (default) Generate code for the subsystem and the blocks within the subsystem.

Architecture	Description
`Module` (default)	Generate code for the subsystem and the blocks within the subsystem.
`BlackBox`	Generate a black box interface. The generated HDL code includes only the input/output port definitions for the subsystem. Therefore, you can use a subsystem in your model to generate an interface to existing, manually written HDL code. The black-box interface generation for subsystems is similar to the Model block interface generation without the clock signals.
`No HDL`	Remove the subsystem from the generated code. You can use the subsystem in simulation, however, treat it as a “no-op” in the HDL code.

BlackBox

Generate a black box interface. The generated HDL code includes only the input/output port definitions for the subsystem. Therefore, you can use a subsystem in your model to generate an interface to existing, manually written HDL code.

The black-box interface generation for subsystems is similar to the Model block interface generation without the clock signals.

No HDL

Remove the subsystem from the generated code. You can use the subsystem in simulation, however, treat it as a “no-op” in the HDL code.

HDL Block Properties

General
AdaptivePipelining	Automatic pipeline insertion based on the synthesis tool, target frequency, and multiplier word-lengths. The default is `inherit`. See also AdaptivePipelining.
BalanceDelays	Detects introduction of new delays along one path and inserts matching delays on the other paths. The default is `inherit`. See also BalanceDelays.
ClockRatePipelining	Insert pipeline registers at a faster clock rate instead of the slower data rate. The default is `inherit`. See also ClockRatePipelining.
ConstrainedOutputPipeline	Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is `0`. For more details, see ConstrainedOutputPipeline.
DistributedPipelining	Pipeline register distribution, or register retiming. The default is `inherit`. See also DistributedPipelining.
FlattenHierarchy	Remove subsystem hierarchy from generated HDL code. The default is `inherit`. See also FlattenHierarchy.
InputPipeline	Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see InputPipeline.
OutputPipeline	Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see OutputPipeline.
SharingFactor	Number of functionally equivalent resources to map to a single shared resource. The default is 0. See also Resource Sharing.
StreamingFactor	Number of parallel data paths, or vectors, that are time multiplexed to transform into serial, scalar data paths. The default is 0, which implements fully parallel data paths. See also Streaming.
SynthesisAttributes	Specifies the synthesis attributes for the blocks and block output signals in the model. The generated HDL code contains these attributes. For more information, see SynthesisAttributes.

Target Specification

This block cannot be the DUT, so the block property settings in the Target Specification tab are ignored.

Version History

Introduced in R2026a

SqrtLUT

Description

Examples

Compute Square Root of Fixed-Point Data

Limitations

Ports

Input

dataIn — Input data signal scalar

validIn — Whether input control signal is valid scalar

Output

dataOut — Output data signal scalar

validOut — Whether output control signal is valid scalar

Parameters

Output data type — Output data type Inherit: Inherit via internal rule (default) | uint8 | uint16 | uint32 | fixdt(1,16,0) | <data type expression>

Programmatic Use

More About

Resource and Performance Comparison of Sqrt and SqrtLUT Blocks

Algorithms

Square Root Computation by Using Lookup Tables

Data Type and Latency Considerations

Extended Capabilities

HDL Code Generation Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

Version History

See Also

dataIn — Input data signal
scalar

validIn — Whether input control signal is valid
scalar

dataOut — Output data signal
scalar

validOut — Whether output control signal is valid
scalar

Output data type — Output data type
`Inherit: Inherit via internal rule` (default) | `uint8` | `uint16` | `uint32` | `fixdt(1,16,0)` | `<data type expression>`

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.