Postprocess Exported Labels for Instance Segmentation Training
Computer Vision Toolbox™ offers these functionalities for training an instance segmentation network, based on the type of network:
| Instance Segmentation Network Type | Functionality |
|---|---|
SOLOv2 |
|
Mask R-CNN |
|
To perform transfer learning using a SOLOv2 or Mask R-CNN network, train the network on a custom ground truth data set. Before training, you must first postprocess the ground truth labels exported from the Image Labeler or Video Labeler app to ensure the annotations meet the network requirements, such as correct label and mask formatting. Then, convert the postprocessed ground truth data into a datastore of the format required by the training data argument of the training function for your desired instance segmentation network. For an example that shows this process, see Create Instance Segmentation Training Data From Ground Truth.
To learn more, see Get Started with Instance Segmentation Using Deep Learning.
Postprocess Exported groundTruth Labels to Extract Training Data
In this tutorial, you postprocess the labeled ground truth data exported from the Image Labeler or
Video Labeler app,
stored in a groundTruth object. To get started with
labeling ground truth, see Label Objects Using Polygons for Instance Segmentation. This image shows the
ground truth data referenced in this tutorial.

Load and Display Exported Ground Truth Data
First, load ground truth data into the MATLAB workspace as a
groundTruth object, and then display the properties of the object.
The LabelDefinitions
and the LabelData
properties of the groundTruth object contain the information for each
label and label information for each object, respectively.
Enter the exported groundTruth object, gTruth, in
the MATLAB® command-line.
>> gTruth
gTruth =
groundTruth with properties:
DataSource: [1×1 groundTruthDataSource]
LabelDefinitions: [3×5 table]
LabelData: [1×3 table]Extract the LabelData property from the
groundTruth object. This property groups the data by label
name.
>> gTruth.LabelData
ans =
1×3 table
Sailboat Tanker Airplane
__________ __________ __________
{3×1 cell} {1×1 cell} {1×1 cell}Create Stacked Instance Mask Data
To prevent the loss of mask stacking information, or the relative ordering of pixels
when objects overlap, stack the polygon label masks. Use the gatherLabelData object function to group the data by label type to
produce one table containing five stacked object masks.
>> out = gatherLabelData(gTruth,labelType.Polygon,GroupLabelData="LabelType")
out =
1×1 cell array
{1×1 table}Display both the polygon coordinates and the associated label names of each labeled
object, stored in the first and second column of the PolygonData
table, respectively. Their order corresponds the order of the stacked polygons, in the
order which they were labeled.
>> out{1}.PolygonData{1}
ans =
5×2 cell array
{12×2 double} {'Airplane}
{ 6×2 double} {'Sailboat'}
{ 7×2 double} {'Sailboat'}
{13×2 double} {'Sailboat'}
{ 9×2 double} {'Tanker'}Now that the labels are flattened, the tanker forms the base layer, and the sailboat overwrites it where their polygons overlap, since the sailboat appears after the tanker in the table.
Create a cell array, polygons, where each element defines the
(x, y) coordinates of a labeled polygon in the
image. Then calculate the number of polygons in the cell array,
numPolygons.
polygons = out{1}.PolygonData{1}(:,1);Preallocate a binary mask stack as a 3-D logical array with the same height and width as the image, and one layer for each polygon mask.
numPolygons = size(polygons,1);
Define the size of the image, imageSize. Then, create a 3-D
logical array, maskStack, of the same size, with an individual layer
for each polygon mask.
imageSize = [645 916]; maskStack = false([imageSize(1:2) numPolygons]);
Create a binary mask for each polygon by converting its coordinates into a mask the
size of the image, and store each mask in a separate layer of the mask stack. Convert
the coordinates of each polygon into a binary mask using the poly2mask function.
for i = 1:numPolygons
maskStack(:,:,i) = poly2mask(polygons{i}(:,1), ...
polygons{i}(:,2),imageSize(1),imageSize(2));
end
Save the mask stack to the workspace as a MAT file.
save("maskData","maskStack")Create Training Datastore
After you postprocess your ground truth data, you must configure your labeled ground
truth training data into a datastore that meets the requirements of the
trainingData input argument of your selected training function. To
select a pretrained network and the corresponding training function, see Choose Instance Segmentation Model.
Set up your training data so that calling the read and readall functions on the datastore returns
a cell array with four columns that contain, in order, the image data, bounding boxes,
object class labels, and binary masks. You can create a datastore in the required format
using these steps:
Create an
ImageDatastorethat returns RGB or grayscale image data. To train a Mask R-CNN network, your image data must be RGB.imds = imageDatastore(imageFolderPath);
Create a
boxLabelDatastorethat returns bounding box data and instance labels as a two-column cell array.labelDatastore = boxLabelDatastore(labelFolderPath);
Create an
ImageDatastoreand specify a custom read function that returns mask data as a binary matrix. For example, given stacked mask data stored in individual MAT files as the variablemaskData, you can define the read function in this way.function mask = customReadMaskFcn(filename) loadedData = load(filename); mask = loadedData.maskData; endCreate a binary mask
ImageDatastoreby specifying the custom read function and the file extension of the mask files.maskDatastore = imageDatastore(maskFolderPath,ReadFcn=customReadMaskFcn,FileExtensions=".mat");
Combine the three datastores using the
combinefunction.trainingDatastore = combine(imds,labelDatastore,maskDatastore);
For more information, see Datastores for Deep Learning (Deep Learning Toolbox).
Once you have created a training datastore in this format, train the instance segmentation network. For training examples, see:
See Also
Apps
- Image Labeler | Video Labeler | Ground Truth Labeler (Automated Driving Toolbox)
Functions
Objects
groundTruth|solov2|maskrcnn|groundTruthMultisignal(Automated Driving Toolbox)
Topics
- Get Started with Instance Segmentation Using Deep Learning
- Create Instance Segmentation Training Data From Ground Truth
- Perform Instance Segmentation Using SOLOv2
- Get Started with SOLOv2 for Instance Segmentation
- Perform Instance Segmentation Using Mask R-CNN
- Get Started with the Image Labeler
- Get Started with the Video Labeler
- Get Started with Ground Truth Labeling (Automated Driving Toolbox)