Is LSTM and fully connected networks changing channels or neurons?

6 visualizzazioni (ultimi 30 giorni)
When I was building the network, I was surprised to find out why LSTM and fully connected networks change the number of channels instead of neurons. My input is a one-dimensional signal (1024 sampling points), which is 1 (C) when analyzing the network × 1 (B) × 1024 (T), when passing through BiLSTM, the channel changes while the others remain unchanged, which is strange and inconsistent with theory. Additionally, how can MATLAB build an SE attention module? The multiplication layer has no broadcasting function and can only be element by element, so 1 × one × How C is related to H × W × What about C multiplied by? Thank you all for your guidance!

Risposta accettata

Ben
Ben il 20 Set 2023
We use "channels" or C to refer to the feature dimension - in the case of LSTM, BiLSTM, GRU I think of the operation as a loop over a sequence of vector inputs - each vector has size C. In practice we have a multi-dimensional array with size CxBxT - the operation loops over the sequence T dimension, vectorizes over the batch B dimension, and uses the C dimension in the computation.
Perhaps the confusion is that often LSTM-s have their output feedback as their input, but this is not necessary, so you can have an LSTM with inputs as vectors of size C1 and outputs of size C2. You can find a description of the algorithm in the "Algorithms" section of the doc page: https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmlayer.html or you can see that the input state dimension and output/hidden state dimensions can be different in the algorithm described here https://en.wikipedia.org/wiki/Long_short-term_memory
Could you clarify what an SE attention module is?
We can implement attention mechanisms that require batch-matrix multiplies by using pagemtimes: https://www.mathworks.com/help/matlab/ref/pagemtimes.html - this is supported as a dlarray method, so it can be used in autodiff workflows such as custom training loops, and implementing custom layers. Alternatively if you need standard self attention you can use selfAttentionLayer: https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html

Più risposte (1)

秀新
秀新 il 21 Set 2023
Firstly, thank you very much for your answer. Below, I will explain the SE attention module. The SE attention module was published in 2018 with the title "Squeeze and Extraction Networks". When the input is a sequence, according to the network structure of this article, MATLAB cannot build it. I just tried changing global pooling to average pooling, which can achieve similar results.

Tag

Prodotti


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by