Main Content

Elements of Ground Truth Objects

The groundTruth object contains information about data sources, label definitions, and marked label annotations. You can export or import ground truth objects from the Image Labeler and the Video Labeler apps. You can also create ground truth objects programmatically. The objects also provide functions to select labels, change file paths of data, and merge ground truth objects.

Fkiw diagram illustrating the Labeler app accepting ground truth source video or images and exporting the ground truth object and, pixel data if there are pixel labels.

Exported Data

Ground truth object MAT file showing its three inputs, datasource, labelDefs, and labelData, and a pixel label data folder showing a series of sequential PNG files.

Ground truth objects exported from labeling apps contains three types of data: the data used for labeling (such as images, an image sequence, or a video), the data that names the labels (such as car, bridge, or overcast), and the data that defines the label (such as a rectangle ROI), including its pixel location in the image. If the ground truth data contains a pixel ROI label, the labeling app saves the pixel information as PNG files in a folder named pixelLabelData. For more details about exported pixel data, see How Labeler Apps Store Exported Pixel Labels. You can programmatically create a ground truth object using this syntax:

gTruth = groundTruth(dataSource,labelDefs,labelData)

Data Source

You can load images or video into a labeling app in one of these formats:

  • Image datastore

  • Image filenames

  • Video filename

  • Image sequence folder

  • Custom image sequence

Programmatically, you must specify the data source as a groundTruthDataSource object to the dataSource input to create a groundTruth object.

Label Definitions

Label definitions describe the ROI and Scene labels. For example, in a highway scene, you might want to create a rectangle ROI named car and a polygon ROI named bridge. ROI labels require you to select a color and, optionally, add a description. You can also group labels (for example, putting car in a group named vehicles).

Programmatically, you must store label definitions in a table and specify them to the labelDefinitions input to create a groundTruth object. The illustration shows which elements of the table correspond to which fields in the labeling app. Each row of the table specifies information for a single label. Each column contains the information for that definition field for the label.

The labelDefinitions table contains the LabelType column, which corresponds to the ROI Label Definition type in the app, and the Name, LabelColor, Group, and Description columns, which correspond to the Label Name, Color, Group, and Label Description (Optional) fields in the app interface, respectively.

You can also create the labelDefinitions table programmatically by using the labelDefinitionCreator object. If you save the table that the object generates to a MAT-file, you can then load the file into a labeling app session.

Label Data

Label data describes the defined labels and the pixel location of the ROI label in the image. For example, if you labeled a car in an image with a rectangle ROI named car, the label data saves the information of a rectangle with dimension [x, y, w, h] where, [x, y] indicates the pixel location of the upper left corner of the rectangle that enclosed the car, and [w, h] indicates the width and height.

Programmatically, the label data is stored in a table and is specified as the labelData input to create a groundTruth object. Each line of the labelData table specifies a single image or timestamp. Each column represents a label definition, specified by its name, as defined in the labelDefinitions table. For example car. The illustration shows a labelData table with an image that contains three cars and one bridge. The pixel locations for the three cars are stored in as a matrix of type double, and the pixel location for the bridge is saved in gTruth.LabelData.bridge one-element cell array.

Table for labelData with three columns. The first column is for the image name or time stamp for video. The remainder columns is for the label name, here it is car and the next column bridge.

The location data for the ROI labels is derived from the labelType enumeration as one of these options:

  • labelType.Rectangle(x,y,w,h)

  • labelType.RotatedRectangle — (xctr,yctr,w,h,yaw)

  • labelType.Cuboid(xctr,yctr,zctr,xlen,ylen,zlen,xrot,yrot,zrot)

  • labelType.ProjectedCuboid(x1,y1,w1,h1,x2,y2,w2,h2)

  • labelType.Line(x1,y1,x2,y2, ... ,xN,yN)

  • labelType.PixelLabel(M-by-1 PixelLabelData column)

  • labelType.Polygon(x1,y1,x2,y2, ... ,xN,yN)

  • labelType.Custom(as specified)

  • labelType.Scene(logical)

For details on how to specify each of the supported enumerations, see the LabelData property of the groundTruth object. For details on how to specify pixel locations in an image, see the Coordinate Systems topic.

In general, the label data table contains the pixel locations for all of the labels and sublabels. It can also contain attribute information. If any image contains sublabels or attributes, then the app stores the additional data together with the label data in a nested structure.

See Also




Related Topics