Does the selfattentionLayer also perform softmax and scaling?

Question

Chih il 3 Apr 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1940374-does-the-selfattentionlayer-also-perform-softmax-and-scaling

Modificato: xingxingcui il 27 Apr 2024

Risposta accettata: Rohit

In https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html, it states that:

A self-attention layer computes single-head or multihead self-attention of its input.

The layer:

Computes the queries, keys, and values from the input
Computes the scaled dot-product attention across heads using the queries, keys, and values
Merges the results from the heads
Performs a linear transformation on the merged result

I wonder if the layer also apply softmax to the scaling (i.e. divide (Q*K) by sqrt(dim))? My understanding is that, within step 2, this softmax and scaling should happen.

Please clarify that for me or more general users.

Thanks.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Rohit il 20 Apr 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1940374-does-the-selfattentionlayer-also-perform-softmax-and-scaling#answer_1219478

I understand that you want to know whether ‘selfAttentionLayer’ performs softmax and scaling operations which are involved to compute attention score.

Yes, we perform both operations to compute scaled attention score and then apply softmax as required in attention mechanism.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Chih il 20 Apr 2023

Thank you very much, Rohit.

Accedi per commentare.

Answer 2

xingxingcui il 11 Gen 2024

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1940374-does-the-selfattentionlayer-also-perform-softmax-and-scaling#answer_1387576

Modificato: xingxingcui il 27 Apr 2024

Hi,@Chih

Please check out the details of the code I wrote here link.

-------------------------Off-topic interlude, 2024-------------------------------

I am currently looking for a job in the field of CV algorithm development, based in Shenzhen, Guangdong, China,or a remote support position. I would be very grateful if anyone is willing to offer me a job or make a recommendation. My preliminary resume can be found at: https://cuixing158.github.io/about/ . Thank you!

Email: cuixingxing150@gmail.com

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Does the selfattentionLayer also perform softmax and scaling?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Does the selfattentionLayer also perform softmax and scaling?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti