Main Content

visionhdl.BirdsEyeView

Transform front-facing camera image into top-down view

Description

The visionhdl.BirdsEyeView System object™ warps a front-facing camera image into a top-down view. It uses a hardware-efficient architecture that supports HDL code generation.

You must provide the homography matrix that describes the transform. This matrix can be calculated from physical camera properties, or empirically derived by analyzing an image of a grid pattern taken by the camera. The object uses the matrix to compute the transformed coordinates of each pixel. The transform does not interpolate between pixel locations. Instead it rounds the result to the nearest coordinate.

The object operates on a trapezoidal region of the input image below the vanishing point. These images show the input region selected for transformation and the resulting top-down view.

You can specify the number of lines in the transformed region and the size of the output frame. If the specified homography matrix cannot map from the requested number of lines to the requested output size, the object returns a warning.

Because the object replicates lines from the input region to create the larger output frame, it cannot complete the transform of one frame before the next frame arrives. The object ignores any new input frames while it is still transforming the previous frame. Therefore, depending on the stored lines and output size, the object can drop input frames. This timing also enables the object to maintain the blanking intervals of the input pixel stream.

To transform a front-facing camera image to top-down view:

  1. Create the visionhdl.BirdsEyeView object and set its properties.

  2. Call the object with arguments, as if it were a function.

To learn more about how System objects work, see What Are System Objects?

Creation

Description

birdsEyeXfrm = visionhdl.BirdsEyeView(hM,MaxBufferSize,Name,Value) returns a bird's-eye transform System object, with the homography matrix set to hM, and a buffer size of MaxBufferSize pixels. You can optionally set additional properties using name-value pairs. Enclose each property name in single quotes.

Properties

expand all

Unless otherwise indicated, properties are nontunable, which means you cannot change their values after calling the object. Objects lock when you call them, and the release function unlocks them.

If a property is tunable, you can change its value at any time.

For more information on changing property values, see System Design in MATLAB Using System Objects.

Transfer function derived from camera parameters, specified as a 3-by-3 matrix.

The homography matrix, h, is derived from four intrinsic parameters of the physical camera setup: the focal length, pitch, height, and principal point (from a pinhole camera model). The default value is the matrix for the camera setup used in the Lane Detection example.

This matrix can be calculated from physical camera properties, or empirically derived by analyzing an image of a grid test pattern taken by the camera. See estimateGeometricTransform (Computer Vision Toolbox) or Using the Single Camera Calibrator App (Computer Vision Toolbox).

Number of input pixels to buffer, specified as an integer. Compute this value from MaxSourceLinesBuffered*ActivePixelsPerLine. The object uses a memory of this size to store the input pixels. If you specify a value that is not a power of two, the object uses the next largest power of two.

Number of lines to transform, specified as an integer. The object stores and transforms this number of lines into the output bird's-eye view image, starting at the vanishing point as determined by the HomographyMatrix.

Storing the full input frame uses too much memory to implement the algorithm without off-chip storage. Therefore, for a hardware implementation, choose a smaller region to store and transform, one that generates an acceptable output frame size.

For example, using the default HomographyMatrix with an input image of 640-by-480 pixels, the full-sized transform results in a 900-by-640 output image. Analysis of the input-to-output x-coordinate mapping shows that around 50 lines of the input image are required to generate the top 700 lines of the bird's-eye view output image. This number of input lines can be stored using on-chip memory. The vanishing point for the default camera setup is around line 200, and lines above that point do not contribute to the resulting bird's-eye view. Therefore, the object can store only input lines 200–250 for transformation.

Horizontal size of output frame, specified as an integer. This parameter is the number of active pixels in each output line.

Vertical size of output frame, specified as an integer. This parameter is the number of active lines in each output frame.

Usage

Description

[pixelout,ctrlout] = birdsEyeXfrm(pixelin,ctrlin) returns the bird's-eye view transformation of the input stream. The frame size of the output stream corresponds to the size you configured in the BirdsEyeViewPixels and BirdsEyeViewLines properties.

This object uses a streaming pixel interface with a structure for frame control signals. This interface enables the object to operate independently of image size and format and to connect with other Vision HDL Toolbox™ objects. The object accepts and returns a scalar pixel value and control signals as a structure containing five signals. The control signals indicate the validity of each pixel and its location in the frame. To convert a pixel matrix into a pixel stream and control signals, use the visionhdl.FrameToPixels object. For a description of the interface, see Streaming Pixel Interface.

Input Arguments

expand all

Single image pixel in a pixel stream, specified as a scalar value representing intensity.

The software supports double and single data types for simulation, but not for HDL code generation.

Data Types: uint | int | fi | logical | double | single

Control signals accompanying the input pixel stream, specified as a pixelcontrol structure containing five logical data type signals. The signals describe the validity of the pixel and its location in the frame. For more details, see Pixel Control Structure.

Data Types: struct

Output Arguments

expand all

Single image pixel in a pixel stream, returned as a scalar value representing intensity.

double and single data types are supported for simulation, but not for HDL code generation.

Data Types: uint | int | fi | logical | double | single

Control signals accompanying the output pixel stream, returned as a pixelcontrol structure containing five logical data type signals. The signals describe the validity of the pixel and its location in the frame. For more details, see Pixel Control Structure.

Data Types: struct

Object Functions

To use an object function, specify the System object as the first input argument. For example, to release system resources of a System object named obj, use this syntax:

release(obj)

expand all

stepRun System object algorithm
releaseRelease resources and allow changes to System object property values and input characteristics
resetReset internal states of System object

Algorithms

The transform from input pixel coordinate (x,y) to the bird's-eye pixel coordinate is derived from the homography matrix, h. The homography matrix is based on physical parameters and therefore is a constant for a particular camera installation.

(x^,y^)=round(h11x+h12y+h13h31x+h32y+h33,h21x+h22y+h23h31x+h32y+h33)

The implementation of the bird's-eye transform in hardware does not directly perform this calculation. Instead, the object precomputes lookup tables for the horizontal and vertical aspects of the transform.

Architecture of the bird's-eye algorithm. The pixel stream goes to a line memory, then each line goes through a horizontal stretch operation and a vertical mapping operation.

First, the object stores the input lines starting from the precomputed vanishing point. The stored pixels form a trapezoid, with short lines near the vanishing point and wider lines near the camera. This storage uses MaxBufferSize memory locations.

The horizontal lookup table contains interpolation parameters that describe the stretch of each line of the trapezoidal input region to the requested width of the output frame. Lines that fall closer to the vanishing point are stretched more than lines nearer to the camera.

The vertical lookup table contains the y-coordinate mapping, and how many times each line is repeated to fill the requested height of the output frame. Near the vanishing point, one input line maps to many output lines, while each line nearer the camera maps to a diminishing number of output lines.

The lookup tables use 3*MaxSourceLinesBuffered memory locations.

Extended Capabilities

Version History

Introduced in R2017b

See Also

Blocks

Objects

Functions

Topics