To construct a classification output layer with cross entropy loss for *k* mutually exclusive classes, use `classificationLayer`

. If you want to use a different loss function for your classification problems, then you can define a custom classification output layer using this example as a guide.

This example shows how to define and create a custom weighted classification output layer with weighted cross entropy loss. Use a weighted classification layer for classification problems with an imbalanced distribution of classes. For an example showing how to use a weighted classification layer in a network, see Speech Command Recognition Using Deep Learning.

To define a custom classification output layer, you can use the template provided in this example, which takes you through the following steps:

Name the layer – Give the layer a name so it can be used in MATLAB

^{®}.Declare the layer properties – Specify the properties of the layer.

Create a constructor function – Specify how to construct the layer and initialize its properties. If you do not specify a constructor function, then the software initializes the properties with

`''`

at creation.Create a forward loss function – Specify the loss between the predictions and the training targets.

Create a backward loss function – Specify the derivative of the loss with respect to the predictions.

A weighted classification layer computes the weighted cross
entropy loss for classification problems. Weighted cross entropy is an error measure between two continuous random
variables. For prediction scores *Y* and training targets *T*,
the weighted cross entropy loss between *Y* and *T* is given by

$$L=-\frac{1}{N}{\displaystyle \sum _{n=1}^{N}{\displaystyle \sum}_{i=1}^{K}}\text{}{w}_{i}{T}_{ni}\mathrm{log}({Y}_{ni}),$$

where *N* is the number of observations,
*K* is the number of classes, and *w* is a vector of
weights for each class.

Copy the classification output layer template into a new file in MATLAB. This template outlines the structure of a classification output layer and includes the functions that define the layer behavior.

classdef myClassificationLayer < nnet.layer.ClassificationLayer properties % (Optional) Layer properties. % Layer properties go here. end methods function layer = myClassificationLayer() % (Optional) Create a myClassificationLayer. % Layer constructor function goes here. end function loss = forwardLoss(layer, Y, T) % Return the loss between the predictions Y and the % training targets T. % % Inputs: % layer - Output layer % Y – Predictions made by network % T – Training targets % % Output: % loss - Loss between Y and T % Layer forward loss function goes here. end function dLdY = backwardLoss(layer, Y, T) % Backward propagate the derivative of the loss function. % % Inputs: % layer - Output layer % Y – Predictions made by network % T – Training targets % % Output: % dLdY - Derivative of the loss with respect to the predictions Y % Layer backward loss function goes here. end end end

First, give the layer a name. In the first line of the class file, replace the
existing name `myClassificationLayer`

with
`weightedClassificationLayer`

.

classdef weightedClassificationLayer < nnet.layer.ClassificationLayer ... end

Next, rename the `myClassificationLayer`

constructor function (the
first function in the `methods`

section) so that it has the same name
as the layer.

methods function layer = weightedClassificationLayer() ... end ... end

Save the layer class file in a new file named
`weightedClassificationLayer.m`

. The file name must match the
layer name. To use the layer, you must save the file in the current folder or in a
folder on the MATLAB path.

Declare the layer properties in the `properties`

section.

By default, custom output layers have the following properties:

`Name`

– Layer name, specified as a character vector or a string scalar. To include a layer in a layer graph, you must specify a nonempty unique layer name. If you train a series network with the layer and`Name`

is set to`''`

, then the software automatically assigns a name to the layer at training time.`Description`

– One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a`Layer`

array. If you do not specify a layer description, then the software displays`"Classification Output"`

or`"Regression Output"`

.`Type`

– Type of the layer, specified as a character vector or a string scalar. The value of`Type`

appears when the layer is displayed in a`Layer`

array. If you do not specify a layer type, then the software displays the layer class name.

Custom classification layers also have the following property:

`Classes`

– Classes of the output layer, specified as a categorical vector, string array, cell array of character vectors, or`'auto'`

. If`Classes`

is`'auto'`

, then the software automatically sets the classes at training time. If you specify the string array or cell array of character vectors`str`

, then the software sets the classes of the output layer to`categorical(str,str)`

. The default value is`'auto'`

.

Custom regression layers also have the following property:

`ResponseNames`

– Names of the responses, specified a cell array of character vectors or a string array. At training time, the software automatically sets the response names according to the training data. The default is`{}`

.

If the layer has no other properties, then you can omit the `properties`

section.

In this example, the layer requires an additional property to save the class weights.
Specify the property `ClassWeights`

in the
`properties`

section.

```
properties
% Vector of weights corresponding to the classes in the training
% data
ClassWeights
end
```

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

Specify input argument `classWeights`

to assign to the
`ClassWeights`

property. Also specify an optional input argument
`name`

to assign to the `Name`

property at
creation. Add a comment to the top of the function that explains the syntaxes of the
function.

function layer = weightedClassificationLayer(classWeights, name) % layer = weightedClassificationLayer(classWeights) creates a % weighted cross entropy loss layer. classWeights is a row % vector of weights corresponding to the classes in the order % that they appear in the training data. % % layer = weightedClassificationLayer(classWeights, name) % additionally specifies the layer name. ... end

Replace the comment `% Layer constructor function goes here`

with
code that initializes the layer properties.

Give the layer a one-line description by setting the
`Description`

property of the layer. Set the
`Name`

property to the optional input argument
`name`

.

```
function layer = weightedClassificationLayer(classWeights, name)
% layer = weightedClassificationLayer(classWeights) creates a
% weighted cross entropy loss layer. classWeights is a row
% vector of weights corresponding to the classes in the order
% that they appear in the training data.
%
% layer = weightedClassificationLayer(classWeights, name)
% additionally specifies the layer name.
% Set class weights
layer.ClassWeights = classWeights;
% Set layer name
if nargin == 2
layer.Name = name;
end
% Set layer description
layer.Description = 'Weighted cross entropy';
end
```

Create a function named `forwardLoss`

that returns the weighted cross
entropy loss between the predictions made by the network and the training targets. The
syntax for `forwardLoss`

is ```
loss = forwardLoss(layer, Y,
T)
```

, where `Y`

is the output of the previous layer and
`T`

represents the training targets.

For classification problems, the dimensions of `T`

depend on the type of
problem.

Classification Task | Input Size | Observation Dimension |
---|---|---|

2-D image classification | 1-by-1-by-K-by-N, where
K is the number of classes and
N is the number of observations. | 4 |

3-D image classification | 1-by-1-by-1-by-K-by-N, where
K is the number of classes and
N is the number of observations. | 5 |

Sequence-to-label classification | K-by-N, where
K is the number of classes and
N is the number of observations. | 2 |

Sequence-to-sequence classification | K-by-N-by-S,
where K is the number of classes,
N is the number of observations, and
S is the sequence length. | 2 |

The size of `Y`

depends on the output of the previous layer. To ensure that
`Y`

is the same size as `T`

, you must include a layer
that outputs the correct size before the output layer. For example, to ensure that
`Y`

is a 4-D array of prediction scores for *K*
classes, you can include a fully connected layer of size *K* followed by a
softmax layer before the output layer.

A weighted classification layer computes the weighted cross
entropy loss for classification problems. Weighted cross entropy is an error measure between two continuous random
variables. For prediction scores *Y* and training targets *T*,
the weighted cross entropy loss between *Y* and *T* is given by

$$L=-\frac{1}{N}{\displaystyle \sum _{n=1}^{N}{\displaystyle \sum}_{i=1}^{K}}\text{}{w}_{i}{T}_{ni}\mathrm{log}({Y}_{ni}),$$

where *N* is the number of observations,
*K* is the number of classes, and *w* is a vector of
weights for each class.

The inputs `Y`

and `T`

correspond to
*Y* and *T* in the equation, respectively. The
output `loss`

corresponds to *L*. Add a comment to the
top of the function that explains the syntaxes of the function.

```
function loss = forwardLoss(layer, Y, T)
% loss = forwardLoss(layer, Y, T) returns the weighted cross
% entropy loss between the predictions Y and the training
% targets T.
N = size(Y,4);
Y = squeeze(Y);
T = squeeze(T);
W = layer.ClassWeights;
loss = -sum(W*(T.*log(Y)))/N;
end
```

Create the backward loss function.

Create a function named `backwardLoss`

that returns the derivatives
of the weighted cross entropy loss with respect to the predictions `Y`

.
The syntax for `backwardLoss`

is ```
loss = backwardLoss(layer, Y,
T)
```

, where `Y`

is the output of the previous layer and
`T`

represents the training targets.

The dimensions of `Y`

and `T`

are the same as the
inputs in `forwardLoss`

.

The derivative of the weighted cross entropy loss with respect to the predictions
*Y* is given by

$$\frac{\delta L}{\delta {Y}_{i}}=-\frac{1}{N}\frac{{w}_{i}{T}_{i}}{{Y}_{i}},$$

where *N* is the number of observations and
*w* is a vector of weights for each class. Add a comment to the top
of the function that explains the syntaxes of the function.

```
function dLdY = backwardLoss(layer, Y, T)
% dLdY = backwardLoss(layer, Y, T) returns the derivatives of
% the weighted cross entropy loss with respect to the
% predictions Y.
[~,~,K,N] = size(Y);
Y = squeeze(Y);
T = squeeze(T);
W = layer.ClassWeights;
dLdY = -(W'.*T./Y)/N;
dLdY = reshape(dLdY,[1 1 K N]);
end
```

View the completed classification output layer class file.

classdef weightedClassificationLayer < nnet.layer.ClassificationLayer properties % Vector of weights corresponding to the classes in the training % data ClassWeights end methods function layer = weightedClassificationLayer(classWeights, name) % layer = weightedClassificationLayer(classWeights) creates a % weighted cross entropy loss layer. classWeights is a row % vector of weights corresponding to the classes in the order % that they appear in the training data. % % layer = weightedClassificationLayer(classWeights, name) % additionally specifies the layer name. % Set class weights layer.ClassWeights = classWeights; % Set layer name if nargin == 2 layer.Name = name; end % Set layer description layer.Description = 'Weighted cross entropy'; end function loss = forwardLoss(layer, Y, T) % loss = forwardLoss(layer, Y, T) returns the weighted cross % entropy loss between the predictions Y and the training % targets T. N = size(Y,4); Y = squeeze(Y); T = squeeze(T); W = layer.ClassWeights; loss = -sum(W*(T.*log(Y)))/N; end function dLdY = backwardLoss(layer, Y, T) % dLdY = backwardLoss(layer, Y, T) returns the derivatives of % the weighted cross entropy loss with respect to the % predictions Y. [~,~,K,N] = size(Y); Y = squeeze(Y); T = squeeze(T); W = layer.ClassWeights; dLdY = -(W'.*T./Y)/N; dLdY = reshape(dLdY,[1 1 K N]); end end end

For GPU compatibility, the layer functions must support inputs
and return outputs of type `gpuArray`

. Any other functions the layer uses
must do the same. Many MATLAB built-in functions support `gpuArray`

input arguments. If you call any of these functions with at least one
`gpuArray`

input, then the function executes on the GPU and returns a
`gpuArray`

output. For a list of functions that execute on a GPU, see
Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep
learning, you must also have a CUDA^{®} enabled NVIDIA^{®} GPU with compute capability 3.0 or higher. For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).

The MATLAB functions used in `forwardLoss`

and
`backwardLoss`

in `weightedClassificationLayer`

all support `gpuArray`

inputs, so the layer is GPU compatible.

Check the validity of the custom classification output layer `weightedClassificationLayer`

.

Define a custom weighted classification layer. To create this layer, save the file `weightedClassificationLayer.m`

in the current folder.

Create an instance of the layer. Specify the class weights as a vector with three elements corresponding to three classes.

classWeights = [0.1 0.7 0.2]; layer = weightedClassificationLayer(classWeights);

Check that the layer is valid using `checkLayer`

. Set the valid input size to the typical size of a single observation input to the layer. The layer expects a 1-by-1-by-*K*-by-*N* array input, where *K* is the number of classes and *N* is the number of observations in the mini-batch.

```
numClasses = numel(classWeights);
validInputSize = [1 1 numClasses];
checkLayer(layer,validInputSize,'ObservationDimension',4);
```

Skipping GPU tests. No compatible GPU device found. Running nnet.checklayer.OutputLayerTestCase .......... ... Done nnet.checklayer.OutputLayerTestCase __________ Test Summary: 13 Passed, 0 Failed, 0 Incomplete, 4 Skipped. Time elapsed: 0.34878 seconds.

The test summary reports the number of passed, failed, incomplete, and skipped tests.

`assembleNetwork`

| `checkLayer`

| `classificationLayer`