layerNormalizationLayer
Layer normalization layer
Description
A layer normalization layer normalizes a mini-batch of data across all channels for each observation independently. To speed up training of recurrent and multilayer perceptron neural networks and reduce the sensitivity to network initialization, use layer normalization layers after the learnable layers, such as LSTM and fully connected layers.
After normalization, the layer scales the input with a learnable scale factor γ and shifts it by a learnable offset β.
Creation
Description
creates a
layer normalization layer.layer
= layerNormalizationLayer
sets the optional layer
= layerNormalizationLayer(Name,Value)
Epsilon
, Parameters and Initialization, Learning Rate and Regularization, and Name
properties using one or more name-value arguments. For
example, layerNormalizationLayer('Name','layernorm')
creates a layer
normalization layer with name 'layernorm'
.
Properties
Examples
Algorithms
The layer normalization operation normalizes the elements xi of the input by first calculating the mean μL and variance σL2 over the spatial, time, and channel dimensions for each observation independently. Then, it calculates the normalized activations as
where ϵ is a constant that improves numerical stability when the variance is very small.
To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow layer normalization, the layer normalization operation further shifts and scales the activations using the transformation
where the offset β and scale factor γ are learnable parameters that are updated during network training.
References
[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. “Layer Normalization.” Preprint, submitted July 21, 2016. https://arxiv.org/abs/1607.06450.
Version History
Introduced in R2021a