inflated3dVideoClassifier

Inflated-3D (I3D) video classifier. Requires Computer Vision Toolbox Model for Inflated-3D Video Classification

Since R2021b

Description

The inflated3dVideoClassifier object is an Inflated-3D (I3D) video classifier pretrained on the Kinetics-400 data set. You can use the pretrained video classifier to classify 400 human actions, such as running, walking, and shaking hands. The I3D classifier model contains two subnetworks: the video network and the optical flow network. Both of these networks are trained on Kinetics-400 with RGB data and optical flow data respectively.

Creation

Syntax

i3d = inflated3dVideoClassifier

i3d = inflated3dVideoClassifier(classifierName,classes)

i3d = inflated3dVideoClassifier(___,Name=Value)

Description

i3d = inflated3dVideoClassifier returns the I3D video classifier pretrained on the Kinetics-400 dataset.

example

i3d = inflated3dVideoClassifier(classifierName,classes) configures the pretrained Inflated 3D (I3D) video classifier for transfer learning on a new set of classes, classes, using one of two pretrained classifiers, specified by classifierName.

i3d = inflated3dVideoClassifier(___,Name=Value) sets properties using name-value arguments in addition to the input arguments from the previous syntax. For example, i3d = inflated3dVideoClassifier("googlenet-video","wavingHello","clapping",InputSize=[224,224,3,64]) sets the input size of the network to 64 frames of 224-by-224 pixels with 3 channels. You can specify multiple name-value arguments.

Note

This object requires the Computer Vision Toolbox™ Model for Inflated-3D Video Classification. You can install the Computer Vision Toolbox Model for Inflated-3D Video Classification from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. To use this object, you must have a license for the Deep Learning Toolbox™.

Input Arguments

expand all

`classifierName` — Classifier name
`"googlenet-video"` | `"googlenet-video-flow"`

Classifier name, specified as "googlenet-video" or "googlenet-video-flow".

Classifier	Description
`"googlenet-video"`	GoogLeNet-based I3D model pretrained on the Kinetics-400 video data for transfer learning.
`"googlenet-video-flow"`	GoogLeNet-based I3D model pretrained on the Kinetics-400 video and optical flow data for transfer learning. During training and inference, both video and optical flow data are used for classification.

Properties

expand all

Configure Classifier Properties

`InputSize` — Size of network
`[224,224,3,64]` (default) | four-element row vector

This property is read-only.

Size of the video classifier network, specified as a four-element row vector in the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames for the video subnetwork.

The input size of the flow subnetwork is equal in height, width, and number of frames, but the number of channels is fixed to 2.

Typical values for the number of frames are 8, 16, 32, or 64. Increase the number of frames to capture the temporal nature of activities when training the classifier. When you are using optical flow data, the number of channels must equal 2, which correspond to the x- and y-components of velocity.

`InputNormalizationStatistics` — Normalization statistics for the video and optical flow data
structure

This property is read-only.

Normalization statistics for the video and optical flow data, specified as a structure with field names Video and OpticalFlow, which are also structures with field names, Min, Max, Mean, and StandardDeviation. The Min and Max field values define the minimum and maximum values for rescaling the video and optical flow data. The Mean, and StandardDeviation values define the mean and standard deviation for input normalization. All field values must be specified as a row vector of size equal to the number of channels for the video input data. When you are using optical flow data, the number of channels must equal 2, which correspond to the x- and y components of velocity.

The default structure contains:

A Video field, which contains the field Min set to [0,0,0], and the field Max set to [255,255,255].
Empty OpticalFlow, Mean, and StandardDeviation field values.

For a video input, the data is rescaled between -1 and 1 using the Min and Max field values. For an optical flow input, the data is rescaled between -1 and 1 using computed minimum and maximum values from the input data.

Note

When the Min and Max field values are not empty, the object first rescales the input data between -1 and 1. Then, if the Mean, and StandardDeviation field values are not empty, the object normalizes the rescaled values by subtracting the mean and dividing by the standard deviation.

An example using this property:

stats.Video = struct(Min=[0,0,0],Max=[255,255,255], ...
Mean=[],StandardDeviation=[]);
stats.OpticalFlow = struct(Min=[-20,-20],Max=[20,20] ,...
Mean=[],StandardDeviation=[]);
i3d = inflated3dVideoClassifier('googlenet-video-flow',["waving","clapping"],InputNormalizationStatistics=stats);

`ModelName` — Name of trained video classifier
string scalar

Name of the trained video classifier, specified as a string scalar.

`Classes` — Classes that the video classifier is configured to train or classify
vector of strings | cell array of character vectors

This property is read-only.

Classes that the video classifier is configured to train or classify, specified as a vector of strings or a cell array of character vectors. For example:

classes = ['kiss','laugh','pick','pour','pushup'];

Training Properties

`VideoLearnables` — Learnable parameters for video subnetwork of I3D video classifier
table with three columns

Learnable parameters for the video subnetwork of the I3D video classifier, specified as a table with three columns.

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

`VideoState` — State of the nonlearnable parameters for the video subnetwork of the I3D video classifier
table with three columns

State of the nonlearnable parameters for the video subnetwork of the I3D video classifier, specified as a table with three columns.

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network state contains information remembered by the network between iterations. For example, the state of LSTM and batch normalization layers. During training or inference, you can update the network state using the output of the forward and predict functions.

`OpticalLearnables` — Learnable parameters for the optical flow subnetwork of the I3D video classifier
table with three columns

Learnable parameters for the optical flow subnetwork of the I3D video classifier, specified as a table with three columns. Network learnable parameters, specified as a table with three columns:

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

`OpticalFlowState` — State of the nonlearnable parameters for the optical flow subnetwork of the I3D video classifier
table with three columns

State of the nonlearnable parameters for the video subnetwork of the I3D video classifier, specified as a table with three columns. Network learnable parameters, specified as a table with three columns:

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

Streaming Video Classification Properties

`VideoSequence` — Video sequence used for streaming classification
4-D numeric array

This property is read-only.

Video sequence used to update and classify sequences for streaming classification, specified as a 4-D numeric array. Each vector in the array is of the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames, for the video subnetwork. The updateSequence and classifySequence object functions use the video sequence specified by the VideoSequence property.

`OpticalFLowSequence` — Optical flow sequence used for streaming classification
4-D numeric array

This property is read-only.

Optical flow sequence used to update and classify sequences for streaming classification, specified as a 4-D numeric array. Each vector in the array is of the form (H,W,C,T), where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames, for the optical flow subnetwork. The updateSequence and classifySequence object functions use the optical flow sequence specified by the OpticalFlowSequence.

Object Functions

expand all

Video Classification

`classifyVideoFile`	Classify a video file
`resetSequence`	Reset video and optical flow sequence properties for streaming video classification
`updateSequence`	Update video or optical flow sequence for classification
`classifySequence`	Classify video and optical flow sequence

Custom Training and Inference

`forward`	Compute video classifier outputs for training
`predict`	Compute video classifier predictions

Examples

collapse all

Classify Video File Using Inflated 3D Video Classifier

This example uses:

Open Live Script

This example shows how to use classifyVideoFile to classify a video using an Inflated 3D video classifier.

Load a pretrained Inflated-3D video network.

i3d = inflated3dVideoClassifier();

Specify the video file name to classify.

videoFilename = 'visiontraffic.avi';

Classify the video using the video classifier.

label = classifyVideoFile(i3d, videoFilename);

Note that the classifier is not fine-tuned to compute the correct predictions for visiontraffic.avi, therefore, the predicted label will not be correct. You must train the classifier for optimal performance on your video data.

Version History

Introduced in R2021b

inflated3dVideoClassifier

Description

Creation

Syntax

Description

Input Arguments

`classifierName` — Classifier name
`"googlenet-video"` | `"googlenet-video-flow"`

Properties

Configure Classifier Properties

`InputSize` — Size of network
`[224,224,3,64]` (default) | four-element row vector

`InputNormalizationStatistics` — Normalization statistics for the video and optical flow data
structure

`ModelName` — Name of trained video classifier
string scalar

`Classes` — Classes that the video classifier is configured to train or classify
vector of strings | cell array of character vectors

Training Properties

`VideoLearnables` — Learnable parameters for video subnetwork of I3D video classifier
table with three columns

`VideoState` — State of the nonlearnable parameters for the video subnetwork of the I3D video classifier
table with three columns

`OpticalLearnables` — Learnable parameters for the optical flow subnetwork of the I3D video classifier
table with three columns

`OpticalFlowState` — State of the nonlearnable parameters for the optical flow subnetwork of the I3D video classifier
table with three columns

Streaming Video Classification Properties

`VideoSequence` — Video sequence used for streaming classification
4-D numeric array

`OpticalFLowSequence` — Optical flow sequence used for streaming classification
4-D numeric array

Object Functions

Video Classification

Custom Training and Inference

Examples

Classify Video File Using Inflated 3D Video Classifier

Version History

See Also

Apps

Functions

Objects

Topics

inflated3dVideoClassifier

Description

Creation

Syntax

Description

Input Arguments

classifierName — Classifier name "googlenet-video" | "googlenet-video-flow"

Properties

Configure Classifier Properties

InputSize — Size of network [224,224,3,64] (default) | four-element row vector

InputNormalizationStatistics — Normalization statistics for the video and optical flow data structure

ModelName — Name of trained video classifier string scalar

Classes — Classes that the video classifier is configured to train or classify vector of strings | cell array of character vectors

Training Properties

VideoLearnables — Learnable parameters for video subnetwork of I3D video classifier table with three columns

VideoState — State of the nonlearnable parameters for the video subnetwork of the I3D video classifier table with three columns

OpticalLearnables — Learnable parameters for the optical flow subnetwork of the I3D video classifier table with three columns

OpticalFlowState — State of the nonlearnable parameters for the optical flow subnetwork of the I3D video classifier table with three columns

Streaming Video Classification Properties

VideoSequence — Video sequence used for streaming classification 4-D numeric array

OpticalFLowSequence — Optical flow sequence used for streaming classification 4-D numeric array

Object Functions

Video Classification

Custom Training and Inference

Examples

Classify Video File Using Inflated 3D Video Classifier

Version History

See Also

Apps

Functions

Objects

Topics

`classifierName` — Classifier name
`"googlenet-video"` | `"googlenet-video-flow"`

`InputSize` — Size of network
`[224,224,3,64]` (default) | four-element row vector

`InputNormalizationStatistics` — Normalization statistics for the video and optical flow data
structure

`ModelName` — Name of trained video classifier
string scalar

`Classes` — Classes that the video classifier is configured to train or classify
vector of strings | cell array of character vectors

`VideoLearnables` — Learnable parameters for video subnetwork of I3D video classifier
table with three columns

`VideoState` — State of the nonlearnable parameters for the video subnetwork of the I3D video classifier
table with three columns

`OpticalLearnables` — Learnable parameters for the optical flow subnetwork of the I3D video classifier
table with three columns

`OpticalFlowState` — State of the nonlearnable parameters for the optical flow subnetwork of the I3D video classifier
table with three columns

`VideoSequence` — Video sequence used for streaming classification
4-D numeric array

`OpticalFLowSequence` — Optical flow sequence used for streaming classification
4-D numeric array