Semantic Segmentation in Point Clouds Using Deep Learning

Segmentation is a fundamental step in processing 3D point clouds. Point cloud semantic segmentation or classification is a process of associating each point in a point cloud with a semantic label such as tree, person, road, vehicle, ocean, or building.

Segmentation clusters points with similar characteristics into homogeneous regions. These regions correspond to specific structures or objects in a point cloud scene.

These are the major approaches for point cloud semantic segmentation.

Classify each point or a point cluster based on individual features by using feature extraction and neighborhood selection.
Extract point statistics and spatial information from the point cloud to classify the points using statistical and contextual modeling.

Applications for segmentation include urban planning, oceanography, forestry, autonomous driving, robotics, and navigation.

Deep Learning-Based Segmentation

Deep learning is an efficient approach to point cloud semantic segmentation in which you train a network to segment and classify points by extracting features from the input data. To get started on deep learning with point clouds, see Deep Learning with Point Clouds.

You cannot apply standard convolutional neural networks (CNNs) used for image segmentation to raw point clouds due to the unordered, sparse, and unstructured nature of point cloud data. In most cases, you must transform raw point clouds before feeding them as an input to a segmentation network.

These are the categories of deep learning segmentation methods, divided by how you format input to the network.

Multiview-based methods — Reduce 3-D point clouds to 2-D projected images and process them directly using 2-D CNNs. After classification you must postprocess the results to restore the 3-D structure.
Voxel-based methods — Input point cloud data as voxels and use the standard 3-D CNNs. This addresses the unordered and unstructured nature of raw point cloud data.
Point-based methods — Directly apply the deep learning network on individual points.

Out of these, PointNet is the most popular point-based deep learning framework. This network uses multi-layer perceptrons (MLP) to extract local features from individual points, then aggregates all local features into global features using maxpooling. PointNet concatenates the aggregated global and local features into combined point features, and then extracts new features from the combined point features by using MLPs. The network predicts semantic labels based on the new features.

PointNet++ is improved on the basic PointNet model by additionally capturing local features hierarchically. For more information on PointNet++, see Get Started with PointNet++.

A semantic segmentation network segments an input point cloud into Ground, Buildings, and Vegetation.

Create Training Data for Semantic Segmentation

Lidar Toolbox™ provides functions to import and read raw point cloud data from several file formats. For more information, see I/O.

The toolbox enables you to divide this data into training and test data sets, and store them as datastore objects. For example, you can store point cloud files by using the fileDatastore object. For more information on datastore objects, see Datastores for Deep Learning (Deep Learning Toolbox).

The Import Point Cloud Data For Deep Learning example shows you how to import a large point cloud data set, and then configure and load a datastore.

Label Data

You need a large, labeled data set to train a deep learning network. If you have an unlabeled data set, you can use the Lidar Labeler app to label your training data. For information on how to use the app, see Get Started with the Lidar Labeler.

Preprocess Data

You can preprocess your data before training the network. Lidar Toolbox provides function to perform various preprocessing tasks.

Denoise, downsample, and filter point clouds.
Convert unorganized data into the organized format.
Divide aerial point cloud data into blocks to perform block-by-block processing.

For more information on preprocessing, see Lidar Processing Applications (Deep Learning Toolbox).

To interactively visualize, analyze, and preprocess point cloud data, use the Lidar Viewer app. For more information on how to use the app, see Get Started with Lidar Viewer.

Augment Data

Data augmentation adds variety to the existing training data. The robustness of a network to data transformations increases when you train it on a data set with a lot of variety.

Augmentation techniques reduce overfitting problems and enable the network to better learn and infer features.

For more information on how to perform data augmentation on point clouds, see Data Augmentations for Lidar Object Detection Using Deep Learning.

Create Semantic Segmentation Network

Define your network based on the network input and the layers.

Lidar Toolbox provides these function to create segmentation networks.

pointnetplusNetwork — Create PointNet++ segmentation network
squeezesegv2Network — Create SqueezeSegV2 segmentation network

You can create a custom network layer-by-layer programmatically. For a list of supported layers and how to create them, see the List of Deep Learning Layers (Deep Learning Toolbox). To visualize the network architecture, use the analyzeNetwork (Deep Learning Toolbox) function.

You can also design a custom network interactively by using the Deep Network Designer (Deep Learning Toolbox).

Train Network

To specify the training options, use the trainingOptions (Deep Learning Toolbox) function. You can train the network by using the trainnet (Deep Learning Toolbox) function.

Segment Point Clouds and Evaluate Results

Segment Point Cloud

Use the pcsemanticseg and semanticseg functions to obtain the segmentation results.

To segment buildings and vegetation from the aerial lidar data, use the segmentAerialLidarBuildings and segmentAerialLidarVegetation functions, respectively.

Evaluate Segmentation Results

Evaluate the segmentation results by using the evaluateSemanticSegmentation function.

Pretrained Segmentation Models in Lidar Toolbox

You can use these pretrained semantic segmentation models to perform point cloud segmentation and classification.

Pretrained Model	Description	Load Pretrained Model	Example
PointNet++	PointNet++ is a hierarchical neural network that captures local geometric features to improve the basic PointNet model. For more information, see Get Started with PointNet++.	Load the pretrained model trained on DALES dataset: load("pointnetplusTrained","net")	Aerial Lidar Semantic Segmentation Using PointNet++ Deep Learning
SqueezeSegV2	SqueezeSegV2 is a CNN for performing end-to-end semantic segmentation of an organized lidar point cloud. Training this network requires 2-D spherical projected images as inputs to the network.	Download the pretrained model trained on the PandaSet data set from Hesai and Scale: https://www.mathworks.com/supportfiles/lidar/data/trainedPointSegNet.mat	Lidar Point Cloud Semantic Segmentation Using SqueezeSegV2 Deep Learning Network
PointSeg	PointSeg is a CNN for performing end-to-end semantic segmentation of road objects based on an organized lidar point cloud. Training this network requires 2-D spherical projected images as inputs to the network.	Download the pretrained model trained on a highway scene data set from an Ouster OS1 sensor: https://www.mathworks.com/supportfiles/lidar/data/trainedPointSegNet.mat	Lidar Point Cloud Semantic Segmentation Using PointSeg Deep Learning Network

Code Generation

To learn how to generate CUDA® code for a segmentation workflow, see these examples.

Note

This functionality requires Deep Learning Toolbox™ licence.