r2plus1dVideoClassifier

R(2+1)D video classifier. Requires Computer Vision Toolbox Model for R(2+1)D Video Classification

Since R2021b

Description

The r2plus1dVideoClassifier object returns an R(2+1)D video classifier pretrained on the Kinetics-400 data set. You can use the pretrained video classifier to classify 400 human actions, such as running, walking, and shaking hands.

Creation

Syntax

rd = r2plus1dVideoClassifier

rd = r2plus1dVideoClassifier("resnet-3d-18",classes)

rd = r2plus1dVideoClassifier(___,Name=Value)

Description

rd = r2plus1dVideoClassifier returns a R(2+1)D video classifier pretrained on the Kinetics-400 dataset.

rd = r2plus1dVideoClassifier("resnet-3d-18",classes) configures the pretrained R(2+1)D video classifier for transfer learning on a new set of classes, classes. The video classifier is pretrained on the Kinetics-400 dataset with a ResNet3D convolutional neural network(CNN) with 18 spatio-temporal layers.

example

rd = r2plus1dVideoClassifier(___,Name=Value) sets properties using name-value arguments in addition to the input arguments from the previous syntax. For example, rd = r2plus1dVideoClassifier("resnet-3d-18",classes,InputSize=[112,112,3,32]) sets the input size of the network. You can specify multiple name-value arguments.

Note

This function requires the Computer Vision Toolbox™ Model for R(2+1)D Video Classification. You can install the Computer Vision Toolbox Model for R(2+1)D Video Classification from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. To use this object, you must have a license for the Deep Learning Toolbox™.

Properties

expand all

Configure Classifier Properties

`InputSize` — Size of the network
`[112,112,3,32]` (default) | four-element row vector

This property is read-only.

Size of the video classifier network, specified as a four-element row vector in the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames for the video subnetwork.

Typical values for the number of frames are 8, 16, 32, or 64. Increase the number of frames to capture the temporal nature of activities when training the classifier.

`InputNormalizationStatistics` — Normalization statistics for the video data
`structure` (default)

This property is read-only.

Normalization statistics for the video data, specified as a structure with field names Min, Max, Mean, and StandardDeviation. The Min and Max field values define the minimum and maximum values for rescaling the video data. The Mean, and StandardDeviation values define the mean and standard deviation for input normalization. All field values must be specified as a row vector of size equal to the number of channels for the video input data.

The default structure contains the fields, Min, Max, Mean and StandardDeviation with values [0,0,0], [255,255,255], [0.45,0.45,0.45], and [0.225,0.225,0.225], respectively. You must calculate the statistics values from the dataset for which you are training the video classifier. To rescale the data using minimum and maximum values precomputed from your dataset, specify both Min and Max. Otherwise, the minimum and maximum values are calculated from each input sequence when using updateSequence or classifyVideoFile.

Note

The object normalizes the data by rescaling it between 0 and 1, and then the rescaled data is standardized by subtracting the mean and dividing by the standard deviation. The rescaled data is standardized if the Mean and StandardDeviation fields are non-empty. The input is automatically normalized when using updateSequence or classifyVideoFile object functions. The data must be manually normalized when using the forward or predict object functions.

`ModelName` — Name of trained video classifier
string scalar

Name of the trained video classifier, specified as a string scalar.

`Classes` — Classes that the video classifier is configured to train or classify
vector of strings | cell array of character vectors

This property is read-only.

Classes that the video classifier is configured to train or classify, specified as a vector of strings or a cell array of character vectors. For example:

classes = ['kiss','laugh','pick','pour','pushup'];

Training Properties

`Learnables` — Learnable parameters for the ResNet (2+1)D video classifier
table with three columns

Learnable parameters for the ResNet (2+1)D video classifier, specified as a table with three columns.

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

`State` — State of the nonlearnable parameters of the R(2+1)D video classifier
table with three columns

State of the nonlearnable parameters for the ResNet (2+1)D video classifier, specified as a table with three columns.

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network state contains information remembered by the network between iterations. For example, the state of long short term networks (LSTM) and batch normalization layers. During training or inference, you can update the network state using the output of the forward and predict object functions.

Streaming Video Classification Properties

`VideoSequence` — Video sequence used for streaming classification
4-D numeric array

This property is read-only.

Video sequence used to update and classify sequences for streaming classification, specified as a 4-D numeric array. Each vector in the array is of the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames, for the video subnetwork. The updateSequence and classifySequence object functions use the video sequence specified by the VideoSequence property.

Object Functions

expand all

Video Classification

`classifyVideoFile`	Classify a video file
`classifySequence`	Classify video sequence
`resetSequence`	Reset video sequence properties for streaming video classification
`updateSequence`	Update video sequence for classification

Custom Training and Inference

`forward`	Compute video classifier outputs for training
`predict`	Compute video classifier predictions

Examples

collapse all

Classify Video Stream Using R(2+1)D Video Classifier

This example uses:

Open Live Script

This example shows how to classify a video stream using a pretrained R(2+1)D video classifier.

Load a pretrained R(2+1)D video classifier.

rd = r2plus1dVideoClassifier();

Create a VideoReader to read a video frame by frame.

videoFilename = "visiontraffic.avi";
reader = VideoReader(videoFilename);

Create a video player to visualize the video data and update the player position to match the size of the video.

player = vision.VideoPlayer;
player.Position(:,3:4) = [reader.Width reader.Height];

Specify the frequency at which the streaming video frames will be classified as 10. The classifier will be applied to a sequence of video frames every 10 frames to balance runtime performance against classification performance.

classificationFrequency = 10;

Specify the sequence length required by the classifier. This is based on the inuput size of the video classifier. You can begin to classify the sequence only after the sequence length reaches the required length.

sequenceLength = rd.InputSize(4);

Read through the video frame by frame, update the sequence with each new frame using updateSequence, and then classify the collected frames using classifySequence.

numFrames = 0;
text = "";

while hasFrame(reader)
    frame = readFrame(reader);
    numFrames = numFrames + 1;

    % Update the sequence with the next video frame.
    rd = updateSequence(rd,frame);

    % Classify the sequence based on the classificationFrequency.
    if mod(numFrames, classificationFrequency) == 0 && numFrames >= sequenceLength
        [label,score] = classifySequence(rd);
        text = string(label) + "; " + num2str(score, "%0.2f");
    end

    % Insert the predicted label into the video frame.
    frame = insertText(frame,[30,30],text,'FontSize',18);

    % Display the video and label. 
    step(player,frame);
end

Version History

Introduced in R2021b

r2plus1dVideoClassifier

Description

Creation

Syntax

Description

Properties

Configure Classifier Properties

`InputSize` — Size of the network
`[112,112,3,32]` (default) | four-element row vector

`InputNormalizationStatistics` — Normalization statistics for the video data
`structure` (default)

`ModelName` — Name of trained video classifier
string scalar

`Classes` — Classes that the video classifier is configured to train or classify
vector of strings | cell array of character vectors

Training Properties

`Learnables` — Learnable parameters for the ResNet (2+1)D video classifier
table with three columns

`State` — State of the nonlearnable parameters of the R(2+1)D video classifier
table with three columns

Streaming Video Classification Properties

`VideoSequence` — Video sequence used for streaming classification
4-D numeric array

Object Functions

Video Classification

Custom Training and Inference

Examples

Classify Video Stream Using R(2+1)D Video Classifier

Version History

See Also

Apps

Functions

Objects

Topics

r2plus1dVideoClassifier

Description

Creation

Syntax

Description

Properties

Configure Classifier Properties

InputSize — Size of the network [112,112,3,32] (default) | four-element row vector

InputNormalizationStatistics — Normalization statistics for the video data structure (default)

ModelName — Name of trained video classifier string scalar

Classes — Classes that the video classifier is configured to train or classify vector of strings | cell array of character vectors

Training Properties

Learnables — Learnable parameters for the ResNet (2+1)D video classifier table with three columns

State — State of the nonlearnable parameters of the R(2+1)D video classifier table with three columns

Streaming Video Classification Properties

VideoSequence — Video sequence used for streaming classification 4-D numeric array

Object Functions

Video Classification

Custom Training and Inference

Examples

Classify Video Stream Using R(2+1)D Video Classifier

Version History

See Also

Apps

Functions

Objects

Topics

`InputSize` — Size of the network
`[112,112,3,32]` (default) | four-element row vector

`InputNormalizationStatistics` — Normalization statistics for the video data
`structure` (default)

`ModelName` — Name of trained video classifier
string scalar

`Classes` — Classes that the video classifier is configured to train or classify
vector of strings | cell array of character vectors

`Learnables` — Learnable parameters for the ResNet (2+1)D video classifier
table with three columns

`State` — State of the nonlearnable parameters of the R(2+1)D video classifier
table with three columns

`VideoSequence` — Video sequence used for streaming classification
4-D numeric array