Contenuto principale

Postprocess Exported Labels for Instance Segmentation Training

Computer Vision Toolbox™ offers these functionalities for training an instance segmentation network, based on the type of network:

Instance Segmentation Network TypeFunctionality

SOLOv2

Mask R-CNN

To perform transfer learning using a SOLOv2 or Mask R-CNN network, train the network on a custom ground truth data set. Before training, you must first postprocess the ground truth labels exported from the Image Labeler or Video Labeler app to ensure the annotations meet the network requirements, such as correct label and mask formatting. Then, convert the postprocessed ground truth data into a datastore of the format required by the training data argument of the training function for your desired instance segmentation network. For an example that shows this process, see Create Instance Segmentation Training Data From Ground Truth.

To learn more, see Get Started with Instance Segmentation Using Deep Learning.

Postprocess Exported groundTruth Labels to Extract Training Data

In this tutorial, you postprocess the labeled ground truth data exported from the Image Labeler or Video Labeler app, stored in a groundTruth object. To get started with labeling ground truth, see Label Objects Using Polygons for Instance Segmentation. This image shows the ground truth data referenced in this tutorial.

Ground truth pixel labels ready for postprocessing and training datastore creation.

Load and Display Exported Ground Truth Data

First, load ground truth data into the MATLAB workspace as a groundTruth object, and then display the properties of the object. The LabelDefinitions and the LabelData properties of the groundTruth object contain the information for each label and label information for each object, respectively.

Enter the exported groundTruth object, gTruth, in the MATLAB® command-line.

>> gTruth

gTruth = 

  groundTruth with properties:

          DataSource: [1×1 groundTruthDataSource]
    LabelDefinitions: [3×5 table]
           LabelData: [1×3 table]

Extract the LabelData property from the groundTruth object. This property groups the data by label name.

>> gTruth.LabelData

ans =

  1×3 table

     Sailboat       Tanker       Airplane 
    __________    __________    __________

    {3×1 cell}    {1×1 cell}    {1×1 cell}

Create Stacked Instance Mask Data

To prevent the loss of mask stacking information, or the relative ordering of pixels when objects overlap, stack the polygon label masks. Use the gatherLabelData object function to group the data by label type to produce one table containing five stacked object masks.

>> out = gatherLabelData(gTruth,labelType.Polygon,GroupLabelData="LabelType")

out =

  1×1 cell array

    {1×1 table}

Display both the polygon coordinates and the associated label names of each labeled object, stored in the first and second column of the PolygonData table, respectively. Their order corresponds the order of the stacked polygons, in the order which they were labeled.

>> out{1}.PolygonData{1}

ans =

  5×2 cell array

    {12×2 double}    {'Airplane}
    { 6×2 double}    {'Sailboat'}
    { 7×2 double}    {'Sailboat'}
    {13×2 double}    {'Sailboat'}
    { 9×2 double}    {'Tanker'}

Now that the labels are flattened, the tanker forms the base layer, and the sailboat overwrites it where their polygons overlap, since the sailboat appears after the tanker in the table.

Create a cell array, polygons, where each element defines the (x, y) coordinates of a labeled polygon in the image. Then calculate the number of polygons in the cell array, numPolygons.

polygons = out{1}.PolygonData{1}(:,1);

Preallocate a binary mask stack as a 3-D logical array with the same height and width as the image, and one layer for each polygon mask.

numPolygons = size(polygons,1);

Define the size of the image, imageSize. Then, create a 3-D logical array, maskStack, of the same size, with an individual layer for each polygon mask.

imageSize = [645 916];
maskStack = false([imageSize(1:2) numPolygons]);

Create a binary mask for each polygon by converting its coordinates into a mask the size of the image, and store each mask in a separate layer of the mask stack. Convert the coordinates of each polygon into a binary mask using the poly2mask function.

for i = 1:numPolygons
    maskStack(:,:,i) = poly2mask(polygons{i}(:,1), ...
                       polygons{i}(:,2),imageSize(1),imageSize(2));
end

Save the mask stack to the workspace as a MAT file.

save("maskData","maskStack")

Create Training Datastore

After you postprocess your ground truth data, you must configure your labeled ground truth training data into a datastore that meets the requirements of the trainingData input argument of your selected training function. To select a pretrained network and the corresponding training function, see Choose Instance Segmentation Model.

Set up your training data so that calling the read and readall functions on the datastore returns a cell array with four columns that contain, in order, the image data, bounding boxes, object class labels, and binary masks. You can create a datastore in the required format using these steps:

  1. Create an ImageDatastore that returns RGB or grayscale image data. To train a Mask R-CNN network, your image data must be RGB.

    imds = imageDatastore(imageFolderPath);
  2. Create a boxLabelDatastore that returns bounding box data and instance labels as a two-column cell array.

    labelDatastore = boxLabelDatastore(labelFolderPath);
  3. Create an ImageDatastore and specify a custom read function that returns mask data as a binary matrix. For example, given stacked mask data stored in individual MAT files as the variable maskData, you can define the read function in this way.

    function mask = customReadMaskFcn(filename)
        loadedData = load(filename);
        mask = loadedData.maskData; 
    end

    Create a binary mask ImageDatastore by specifying the custom read function and the file extension of the mask files.

    maskDatastore = imageDatastore(maskFolderPath,ReadFcn=customReadMaskFcn,FileExtensions=".mat");
  4. Combine the three datastores using the combine function.

    trainingDatastore = combine(imds,labelDatastore,maskDatastore);

For more information, see Datastores for Deep Learning (Deep Learning Toolbox).

Once you have created a training datastore in this format, train the instance segmentation network. For training examples, see:

See Also

Apps

Functions

Objects

Topics