drise

Explain object detection network predictions using D-RISE

Since R2024a

collapse all in page

Syntax

scoreMap = drise(detector,I)

scoreMap = drise(customDetection,I)

scoreMap = drise(___,bboxIn,labelIn)

[scoreMap,bboxOut,scores,labelOut] = drise(detector,I)

___ = drise(___,Name=Value)

Description

example

scoreMap = drise(detector,I) returns a saliency map for the specified image I and object detection network detector. The function calculates the saliency map by using the detector randomized input sampling for explanation (D-RISE) algorithm. This function requires Deep Learning Toolbox™ Verification Library and Computer Vision Toolbox™.

example

scoreMap = drise(customDetection,I) specifies a custom detection function.

example

scoreMap = drise(___,bboxIn,labelIn) also specifies the bounding boxes bboxIn and labels labelIn corresponding to the detections you want to explain.

[scoreMap,bboxOut,scores,labelOut] = drise(detector,I) also returns the bounding boxes bboxOut, scores scores, and labels labelOut made by the object detector network.

example

___ = drise(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of input and output arguments from the previous syntaxes.

Examples

collapse all

Understand Which Parts of Image Are Important for Detection

This example uses:

Open Live Script

Load a YOLO v2 object detector trained to detect vehicles.

s = load("yolov2VehicleDetector.mat");
detector = s.detector;

Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

img = imread("testCar.png");
img = im2single(img);

Detect vehicles in the test image by using the trained YOLO v2 detector. Pass the test image and the detector as input to the detect function. The detect function returns the bounding boxes and the detection scores.

[bboxes,scores,labels] = detect(detector,img);
figure
annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes,scores);
imshow(annotatedImage)

Figure contains an axes object. The axes object contains an object of type image.

Use the drise function to create saliency maps explaining the detections made by the YOLO v2 object detector.

scoreMap = drise(detector,img);

Plot the saliency map over the image. Areas highlighted in red are more significant in the detection than areas highlighted in blue.

tiledlayout(1,2,TileSpacing="tight")

for i = 1:2
    nexttile
    annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes(i,:),scores(i));
    imshow(annotatedImage)
    hold on
    imagesc(scoreMap(:,:,i),AlphaData=0.5)
    title("DRISE Map: Detection " + i)
    hold off
end

colormap jet

Figure contains 2 axes objects. Axes object 1 with title DRISE Map: Detection 1 contains 2 objects of type image. Axes object 2 with title DRISE Map: Detection 2 contains 2 objects of type image.

Specify Bounding Box and Label to Understand

This example uses:

Open Live Script

Load a YOLO v2 object detector pretrained to detect vehicles.

s = load('yolov2VehicleDetector.mat');
detector = s.detector;

Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

img = imread("testCar.png");
img = im2single(img);

Specify the target detections you want to understand.

targetBbox = [125 64 116 85];
targetLabel = 1;

Use the drise function and the target bounding boxes and labels to create saliency maps explaining the detections made by the YOLO v2 object detector.

scoreMap = drise(detector,img,targetBbox,targetLabel);

Plot the saliency map over the image. Areas highlighted in red are more significant in the detection than areas highlighted in blue.

figure
annotatedImage = insertObjectAnnotation(img,"rectangle",targetBbox,"vehicle");
imshow(annotatedImage)
hold on
imagesc(scoreMap,AlphaData=0.5)
title("DRISE Map")
hold off
colormap jet

Figure contains an axes object. The axes object with title DRISE Map contains 2 objects of type image.

Specify Additional Options

This example uses:

Open Live Script

Load a YOLO v2 object detector pretrained to detect vehicles.

s = load('yolov2VehicleDetector.mat');
detector = s.detector;

Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

img = imread("testCar.png");
img = im2single(img);

[bboxes,scores,labels] = detect(detector,img);
figure
annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes,scores);
imshow(annotatedImage)

Figure contains an axes object. The axes object contains an object of type image.

Use the drise function to create saliency maps explaining the detections made by the YOLO v2 object detector. To increase the number of mask images that the function uses to generate the saliency maps, set the number of samples to 16,384. Use a mask resolution of 8-by-8 pixels and a mask probability of 0.85. With the increase in the number of samples, the drise function takes longer to run. To track the progress, return the verbose output.

scoreMap = drise(detector,img, ...
    NumSamples=16384, ...
    MaskResolution=[8 8], ...
    MaskProbability=0.85, ...
    MiniBatchSize=256, ...
    Verbose=true);

Computing target detections...Explaining 2 detections.
Number of mini-batches to process: 64
..........   ..........   ..........   ..........   .......... (50 mini-batches)
..........   ....                                              (64 mini-batches)
Total time = 56.7secs.

Plot the saliency map over the image. Areas highlighted in red are more significant in the detection than areas highlighted in blue.

tiledlayout(1,2,TileSpacing="tight")

for i = 1:2
nexttile
annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes(i,:),scores(i));
imshow(annotatedImage)
hold on
imagesc(scoreMap(:,:,i),AlphaData=0.5)
title("DRSIE Map: Detection " + i);
hold off
end
colormap jet

Figure contains 2 axes objects. Axes object 1 with title DRSIE Map: Detection 1 contains 2 objects of type image. Axes object 2 with title DRSIE Map: Detection 2 contains 2 objects of type image.

Specify Custom Detector

This example uses:

Open Live Script

Load a YOLO v2 object detector pretrained to detect vehicles.

s = load("yolov2VehicleDetector.mat");
detector = s.detector;

Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

img = imread("testCar.png");
img = im2single(img);

You can create saliency maps for an object detector that you call using a function handle. The function handle must take exactly one input argument, which is the image, and return exactly three output arguments: the bounding boxes, the class probabilities, and the objectness score.

Modify the YOLO v2 detector to create a custom detector. You can use the function handle input to specify additional name-value arguments to the detect method. Return all detected bounding boxes by setting SelectStrongest to false.

function [bboxes,classProbs,objectness] = customDetector(detector,img)

[bboxes,~,~,intermediates] = detect(detector,img,SelectStrongest=false);

if isa(intermediates,"cell")
    classProbs = cellfun(@(x)getFields(x,"ClassProbabilities"), ...
        intermediates,UniformOutput=false);
    
    objectness = cellfun(@(x)getFields(x,"ObjectnessScores"), ...
        intermediates,UniformOutput=false);
else
    classProbs = intermediates.ClassProbabilities;
    objectness = intermediates.ObjectnessScores;
end
end

function z = getFields(x,fieldName)
if ~isempty(x)
    z = x.(fieldName);
else
    z = [];
end
end

Specify target detections to understand. For a function handle input, you must specify a numeric value corresponding to the index of the class label.

targetBbox = [125 64 116 85];
targetLabel = 1;

Generate the saliency map.

scoreMap = drise(@(img)customDetector(detector,img),img,targetBbox,targetLabel);

Plot the results.

figure
annotatedImage = insertObjectAnnotation(img,"rectangle",targetBbox,"vehicle");
imshow(annotatedImage)
hold on
imagesc(scoreMap,AlphaData=0.5)
title("DRISE Map: Custom Detector")
hold off
colormap jet

Figure contains an axes object. The axes object with title DRISE Map: Custom Detector contains 2 objects of type image.

Input Arguments

collapse all

`detector` — Object detection network
`yolov2ObjectDetector` object | `yolov3ObjectDetector` object | `yolov4ObjectDetector` object | `yoloxObjectDetector` object

Object detection network, specified as a yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), yolov4ObjectDetector (Computer Vision Toolbox), or yoloxObjectDetector (Computer Vision Toolbox) object.

`I` — Input image
real-valued H-by-W-by-C array

Input image, specified as a real-valued H-by-W-by-C array, where H, W, and C are the height, width, and channel size of the image, respectively.

The image must be a real, nonsparse grayscale or RGB image.

The channel size in each image must be equal to the network input channel size. For example, C must be 1 for a grayscale image and 3 for an RGB image.

`bboxIn` — Input bounding boxes
real-valued matrix

Input bounding boxes, specified as a real-valued M-by-4 matrix, where M is the number of detections. Specify each bounding box as a four-element row vector in the form [x y width height], where:

x is the x-coordinate of the upper-left corner of the bounding box.
y is the y-coordinate of the upper-left corner of the bounding box.
width is the width of the bounding box.
height is the height of the bounding box.

`labelIn` — Input labels
integer-valued column vector | categorical column vector | string array

Input labels, specified as an integer-valued column vector, a categorical array, or a string array. This input must have a size of M-by-1, where M is the number of detections. When you specify a function handle input, you must specify this input as an integer-valued column vector.

`customDetection` — Custom detection options
function handle

Custom detection options, specified as a function handle. The custom detection function must take a single input image and return three outputs:

Bounding boxes, returned as an M-by-4 array of positive, real numbers where M is the number of detections.
Class probabilities, returned as an M-by-C array of positive, real numbers corresponding to the class probabilities for each detection, where C is the number of classes. If your detector does not return class probabilities, then return an empty array, [ ], instead.
Objectness score, returned as a M-by-1 vector corresponding to the probability an object is present in each detection. If your detector does not return objectness scores, then return an empty array, [ ], instead.

Use this input to specify additional options for the detect function, to use other built-in detectors such as a fasterRCNNObjectDetector, or to use detectors from other frameworks.

If the function takes a batch of images as input, then the output must be a N-by-1 cell array, where N is the number of images. Each element of the cell must include the bounding boxes, class probabilities, and objectness score for the corresponding image.

Note

For YOLO v2, v3, v4, and X detectors, the class probabilities and objectness are returned by the fourth output argument from the detect function. For more information, see yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), yolov4ObjectDetector (Computer Vision Toolbox), and yoloxObjectDetector (Computer Vision Toolbox).

Data Types: function_handle

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: Threshold=0.75,Verbose=true sets the detection threshold to 0.75 and enables verbose output.

`Threshold` — Detection threshold
scalar in the range [0, 1]

Detection threshold, specified as a scalar in the range [0, 1]. The software removes detections whose scores are lower than this value. The default value is 0.5 when you specify detector as a yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), or yolov4ObjectDetector (Computer Vision Toolbox) object. The default value is 0.25 when you specify detector as a yoloxObjectDetector (Computer Vision Toolbox) object.

This argument applies only if your function syntax does not include the customDetection input.

`NumSamples` — Number of samples
`2048` (default) | positive integer

Number of samples, specified as a positive integer. This value specifies the number of mask images that the function uses to generate the saliency map. A larger number of samples yields better results but requires more computation time.

`MaskResolution` — Mask resolution
`[16 16]` (default) | positive integer | two-element row vector of positive integers

Mask resolution, specified as a positive integer or a two-element row vector of positive integers. If you specify a single positive integer k then the function uses a map with resolution [k k].

The function uses bilinear interpolation to upscale the mask to the size of the image. A small mask resolution returns a masked image with fewer but larger occluded regions. A large mask resolution returns a masked image with more but smaller occluded regions.

`MaskProbability` — Mask probability
`0.75` (default) | scalar in the range [0, 1]

Mask probability, specified as a scalar in the range [0, 1].

Each pixel in the mask is randomly populated with either 0 or 1, where the probability of 1 is set by the mask probability value. A value of 1 means that the pixel is not masked and none of the image is occluded.

`MiniBatchSize` — Size of mini-batch
`8` (default) | positive integer

Size of the mini-batch, specified as a positive integer.

The mini-batch size specifies the number of masked images that are passed to the detector at a time. Larger mini-batch sizes lead to faster computation, at the cost of more memory.

`Verbose` — Option to enable verbose output
`false` or `0` (default) | `true` or `1`

Option to enable verbose output, specified as a numeric or logical 1 (true) or 0 (false). When you set this input to 1 (true), the function returns the progress of the D-RISE algorithm by indicating which mini-batch the function is processing and the total number of mini-batches. The function also returns the amount of time computation takes.

Output Arguments

collapse all

`scoreMap` — Saliency map
numeric matrix | numeric array

Saliency map, returned as a numeric matrix or numeric array. Areas in the map with higher positive values correspond to regions of input data that contribute positively to the detection.

If the image has multiple detections, scoreMap is specified as a 3-D array, and the ith element, scoreMap(:,:,i), corresponds to the saliency map for the ith detection.

Data Types: double

`bboxOut` — Location of objects detected
M-by-4 matrix

Location of objects detected within the input image or images, returned as an M-by-4 matrix. M is the number of bounding boxes in an image.

Each row of bboxOut contains a four-element vector of the form [x y width height]. This vector specifies the upper-left corner and size of that corresponding bounding box in pixels.

`scores` — Detection scores
M-by-1 vector

Detection confidence scores, returned as an M-by-1 vector. M is the number of bounding boxes in an image. A higher score indicates higher confidence in the detection.

`labelOut` — Labels for bounding boxes
M-by-1 categorical array

Labels for bounding boxes, returned as an M-by-1 categorical array. M is the number of labels in an image.

References

[1] Petsiuk, Vitali, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko. “Black-Box Explanation of Object Detectors via Saliency Maps.” Preprint, submitted June 10, 2021. https://arxiv.org/abs/2006.03204.

Version History

Introduced in R2024a

drise

Syntax

Description

Examples

Understand Which Parts of Image Are Important for Detection

Specify Bounding Box and Label to Understand

Specify Additional Options

Specify Custom Detector

Input Arguments

`detector` — Object detection network
`yolov2ObjectDetector` object | `yolov3ObjectDetector` object | `yolov4ObjectDetector` object | `yoloxObjectDetector` object

`I` — Input image
real-valued H-by-W-by-C array

`bboxIn` — Input bounding boxes
real-valued matrix

`labelIn` — Input labels
integer-valued column vector | categorical column vector | string array

`customDetection` — Custom detection options
function handle

Name-Value Arguments

`Threshold` — Detection threshold
scalar in the range [0, 1]

`NumSamples` — Number of samples
`2048` (default) | positive integer

`MaskResolution` — Mask resolution
`[16 16]` (default) | positive integer | two-element row vector of positive integers

`MaskProbability` — Mask probability
`0.75` (default) | scalar in the range [0, 1]

`MiniBatchSize` — Size of mini-batch
`8` (default) | positive integer

`Verbose` — Option to enable verbose output
`false` or `0` (default) | `true` or `1`

Output Arguments

`scoreMap` — Saliency map
numeric matrix | numeric array

`bboxOut` — Location of objects detected
M-by-4 matrix

`scores` — Detection scores
M-by-1 vector

`labelOut` — Labels for bounding boxes
M-by-1 categorical array

References

Version History

See Also

Functions

Objects

Topics

drise

Syntax

Description

Examples

Understand Which Parts of Image Are Important for Detection

Specify Bounding Box and Label to Understand

Specify Additional Options

Specify Custom Detector

Input Arguments

detector — Object detection network yolov2ObjectDetector object | yolov3ObjectDetector object | yolov4ObjectDetector object | yoloxObjectDetector object

I — Input image real-valued H-by-W-by-C array

bboxIn — Input bounding boxes real-valued matrix

labelIn — Input labels integer-valued column vector | categorical column vector | string array

customDetection — Custom detection options function handle

Name-Value Arguments

Threshold — Detection threshold scalar in the range [0, 1]

NumSamples — Number of samples 2048 (default) | positive integer

MaskResolution — Mask resolution [16 16] (default) | positive integer | two-element row vector of positive integers

MaskProbability — Mask probability 0.75 (default) | scalar in the range [0, 1]

MiniBatchSize — Size of mini-batch 8 (default) | positive integer

Verbose — Option to enable verbose output false or 0 (default) | true or 1

Output Arguments

scoreMap — Saliency map numeric matrix | numeric array

bboxOut — Location of objects detected M-by-4 matrix

scores — Detection scores M-by-1 vector

labelOut — Labels for bounding boxes M-by-1 categorical array

References

Version History

See Also

Functions

Objects

Topics

`detector` — Object detection network
`yolov2ObjectDetector` object | `yolov3ObjectDetector` object | `yolov4ObjectDetector` object | `yoloxObjectDetector` object

`I` — Input image
real-valued H-by-W-by-C array

`bboxIn` — Input bounding boxes
real-valued matrix

`labelIn` — Input labels
integer-valued column vector | categorical column vector | string array

`customDetection` — Custom detection options
function handle

`Threshold` — Detection threshold
scalar in the range [0, 1]

`NumSamples` — Number of samples
`2048` (default) | positive integer

`MaskResolution` — Mask resolution
`[16 16]` (default) | positive integer | two-element row vector of positive integers

`MaskProbability` — Mask probability
`0.75` (default) | scalar in the range [0, 1]

`MiniBatchSize` — Size of mini-batch
`8` (default) | positive integer

`Verbose` — Option to enable verbose output
`false` or `0` (default) | `true` or `1`

`scoreMap` — Saliency map
numeric matrix | numeric array

`bboxOut` — Location of objects detected
M-by-4 matrix

`scores` — Detection scores
M-by-1 vector

`labelOut` — Labels for bounding boxes
M-by-1 categorical array