Main Content

Object Detection

Perform classification, object detection, transfer learning using convolutional neural networks (CNNs, or ConvNets), create customized detectors

Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When looking at images or video, humans can recognize and locate objects of interest in a matter of moments. The goal of object detection is to replicate this intelligence using a computer. The best approach for object detection depends on your application and the problem you are trying to solve.

Deep learning techniques require a large number of labeled training images, so the use of a GPU is recommended to decrease the time needed to train a model. Deep learning-based approaches to object detection use convolutional neural networks (CNNs or ConvNets), such as R-CNN and YOLO, or use single-shot detection (SSD). You can train a custom object detector, or use a pretrained object detector by leveraging transfer learning, an approach that enables you to start with a pretrained network and then fine-tune it for your application. Convolutional neural networks require Deep Learning Toolbox™. Training and prediction are supported on a CUDA®-capable GPU. Use of a GPU is recommended and requires Parallel Computing Toolbox™. For more information, see Preferenze Computer Vision Toolbox and Parallel Computing Support in MathWorks Products (Parallel Computing Toolbox).

Machine learning techniques for object detection include aggregate channel features (ACF), support vector machines (SVM) classification using histograms of oriented gradient (HOG) features, and the Viola-Jones algorithm for human face or upper-body detection. You can choose to start with a pretrained object detector or create a custom object detector to suit your application.

Labeled boats, neural network, and person detector


Image LabelerLabel images for computer vision applications
Video LabelerLabel video for computer vision applications


espandi tutto

Deep Learning Detectors

rcnnObjectDetectorDetect objects using R-CNN deep learning detector
fastRCNNObjectDetectorDetect objects using Fast R-CNN deep learning detector
fasterRCNNObjectDetectorDetect objects using Faster R-CNN deep learning detector
ssdObjectDetectorDetect objects using SSD deep learning detector (Da R2020a)
yolov2ObjectDetectorDetect objects using YOLO v2 object detector
yolov3ObjectDetectorDetect objects using YOLO v3 object detector (Da R2021a)
yolov4ObjectDetectorDetect objects using YOLO v4 object detector (Da R2022a)

Feature-based Detectors

readAprilTagDetect and estimate pose for AprilTag in image (Da R2020b)
readArucoMarkerDetect and estimate pose for ArUco marker in image (Da R2024a)
generateArucoMarkerGenerate ArUco marker images (Da R2024a)
readBarcodeDetect and decode 1-D or 2-D barcode in image (Da R2020a)
acfObjectDetectorDetect objects using aggregate channel features
peopleDetectorACFDetect people using aggregate channel features
vision.CascadeObjectDetectorDetect objects using the Viola-Jones algorithm
vision.ForegroundDetectorForeground detection using Gaussian mixture models
vision.PeopleDetector(To be removed) Detect upright people using HOG features
vision.BlobAnalysisProperties of connected regions

Detect Objects Using Point Features

detectBRISKFeaturesDetect BRISK features
detectFASTFeaturesDetect corners using FAST algorithm
detectHarrisFeaturesDetect corners using Harris–Stephens algorithm
detectKAZEFeaturesDetect KAZE features
detectMinEigenFeaturesDetect corners using minimum eigenvalue algorithm
detectMSERFeaturesDetect MSER features
detectORBFeaturesDetect ORB keypoints
detectSIFTFeaturesDetect scale invariant feature transform (SIFT) features (Da R2021b)
detectSURFFeaturesDetect SURF features
extractFeaturesExtract interest point descriptors
matchFeaturesFind matching features

Select Detected Objects

selectStrongestBboxSelect strongest bounding boxes from overlapping clusters using nonmaximal suppression (NMS)
selectStrongestBboxMulticlassSelect strongest multiclass bounding boxes from overlapping clusters using nonmaximal suppression (NMS)

Load Training Data

boxLabelDatastoreDatastore for bounding box label data (Da R2019b)
groundTruthGround truth label data
imageDatastoreDatastore for image data
objectDetectorTrainingDataCreate training data for an object detector
combineCombine data from multiple datastores

Train Feature-Based Object Detectors

trainACFObjectDetectorTrain ACF object detector
trainCascadeObjectDetectorTrain cascade object detector model
trainImageCategoryClassifierTrain an image category classifier

Train Deep Learning Based Object Detectors

trainRCNNObjectDetectorTrain R-CNN deep learning object detector
trainFastRCNNObjectDetectorTrain Fast R-CNN deep learning object detector
trainFasterRCNNObjectDetectorTrain Faster R-CNN deep learning object detector
trainSSDObjectDetectorTrain an SSD deep learning object detector (Da R2020a)
trainYOLOv2ObjectDetectorTrain YOLO v2 object detector
trainYOLOv3ObjectDetectorTrain YOLO v3 object detector (Da R2024a)
trainYOLOv4ObjectDetectorTrain YOLO v4 object detector (Da R2022a)

Augment and Preprocess Training Data for Deep Learning

balanceBoxLabelsBalance bounding box labels for object detection (Da R2020a)
bboxcropCrop bounding boxes (Da R2019b)
bboxeraseRemove bounding boxes (Da R2021a)
bboxresizeResize bounding boxes (Da R2019b)
bboxwarpApply geometric transformation to bounding boxes (Da R2019b)
bbox2pointsConvert rectangle to corner points list
imwarpApply geometric transformation to image
imcropCrop image
imresizeResize image
randomAffine2dCreate randomized 2-D affine transformation (Da R2019b)
centerCropWindow2dCreate rectangular center cropping window (Da R2019b)
randomWindow2dRandomly select rectangular region in image (Da R2021a)
integralImageCalculate 2-D integral image

R-CNN (Regions With Convolutional Neural Networks)

rcnnBoxRegressionLayerBox regression layer for Fast and Faster R-CNN
fasterRCNNLayersCreate a faster R-CNN object detection network (Da R2019b)
rpnSoftmaxLayerSoftmax layer for region proposal network (RPN)
rpnClassificationLayerClassification layer for region proposal networks (RPNs)
regionProposalLayerRegion proposal layer for Faster R-CNN
roiAlignLayerNon-quantized ROI pooling layer for Mask-CNN (Da R2020b)
roiInputLayerROI input layer for Fast R-CNN
roiMaxPooling2dLayerNeural network layer used to output fixed-size feature maps for rectangular ROIs
roialignNon-quantized ROI pooling of dlarray data (Da R2021b)

YOLO v2 (You Only Look Once version 2)

yolov2LayersCreate YOLO v2 object detection network
yolov2TransformLayerCreate transform layer for YOLO v2 object detection network
yolov2OutputLayerCreate output layer for YOLO v2 object detection network
spaceToDepthLayerSpace to depth layer (Da R2020b)

Focal Loss

focalCrossEntropyCompute focal cross-entropy loss (Da R2020b)

SSD (Single Shot Detector)

ssdMergeLayerCreate SSD merge layer for object detection (Da R2020a)

Anchor Boxes

estimateAnchorBoxesEstimate anchor boxes for deep learning object detectors (Da R2019b)
cuboid2imgProject cuboids from 3-D world coordinates to 2-D image coordinates (Da R2022b)
insertObjectAnnotationAnnotate truecolor or grayscale image or video
insertObjectMask Insert masks in image or video stream (Da R2020b)
insertShapeInsert shapes in image or video
showShapeDisplay shapes on image, video, or point cloud (Da R2020b)
evaluateObjectDetectionEvaluate object detection data set against ground truth (Da R2023b)
objectDetectionMetricsObject detection quality metrics (Da R2023b)
mAPObjectDetectionMetricMean average precision (mAP) metric for object detection (Da R2024a)
bboxOverlapRatioCompute bounding box overlap ratio
bboxPrecisionRecallCompute bounding box precision and recall against ground truth


Deep Learning Object DetectorDetect objects using trained deep learning object detector (Da R2021b)


Get Started

Training Data for Object Detection and Instance Segmentation

Get Started With Deep Learning

  • Deep Learning in MATLAB (Deep Learning Toolbox)
    Scoprire le capacità del Deep Learning in MATLAB® utilizzando le reti neurali convoluzionali per la classificazione e la regressione, incluse le reti preaddestrate e il transfer learning, nonché l’addestramento su GPU, CPU, cluster e cloud.
  • Reti neurali profonde preaddestrate (Deep Learning Toolbox)
    Apprendere come scaricare e utilizzare le reti neurali convoluzionali preaddestrate per la classificazione, il transfer learning e l’estrazione di feature.