Unexpected hidden activation dimensions in convolutional neural network

1 view (last 30 days)
I am attempting to build a multi-layer convolutional neural network, with multiple conv layers (and pooling, dropout, activation layers in between). However, I am a bit confused about the sizes of the weights and the activations from each conv layer.
For simplicity, let's assume each conv layer consists of M filters of size m x m. I define each conv layer using convolution2dLayer([m,m],M,'Padding','Same').
The first layer takes in a single image and outputs M images (4D array with last dimension M). The first layer also has weights of dimension m x m x 1 x M. This is all what I would expect.
The subsequent layers are where I am getting confused. I expect the 2nd conv layer to take in M images, and apply M filters of size m x m (weight dimension m x m x 1 x M), resulting in an output with M^2 images, as we apply all M filters to each of the M inputs. Instead, the weights have dimensions m x m x M x M, and there are only M output images (according to the "activations" function).
The later conv layers are the same as the 2nd layer, where the weights are size m x m x M x M, and there are only M output images from each layer.
Am I missing something?

Answers (1)

Hrishikesh Borate
Hrishikesh Borate on 20 Apr 2021
In a convolution layer, the depth of a filter is equal to the depth of the input or the number of input channels. Hence, the dimension of weights in a convolution layer can be calculated as :-
(filter height) x (filter width) x (input depth or number of input channels) x (number of filters).
For example, if input to a network is an image with single channel and each convolution layer is defined as :-
convolution2dLayer([m,m], M, 'Padding', 'same');
Under the assumption that the network contains only convolution layers, the weights in the first convolution layer will have dimension = m x m x 1 x M (as the input depth = 1) and the output of this layer will have dimension = (input image height) x (input image width) x (number of filters = M). These output activations will be the input to second convolution layer, hence the weights of the second convolution layer will have the following dimension :-
(filter height = m) x (filter width = m) x (input depth = M) x (number of filters = M)
Similarly, the dimension of weights in subsequent convolution layers will be m x m x M x M.
For more information, refer to convolution2dLayer.


Find more on Deep Learning with Time Series and Sequence Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by