Training error when use selfAttentionLayer with DropOut

Question

0 voti

I want to use selfAttentionLayer to construct the time series prediction model. However, when I use selfAttentionLayer with DropOut, the training process generates error. The error messages are shown as follows:

Error using max

Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in nnet.internal.cnn.util.boundAwayFromZero (line 10)

x = max(x, eps(precision), 'includenan');

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in gpuArray/internal_softmaxBackward (line 13)

Z = nnet.internal.cnn.util.boundAwayFromZero(Z);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in nnet.internal.cnnhost.scaledDotProductAttentionBackward (line 23)

dU = internal_softmaxBackward(matlab.lang.internal.move(dW), W, 1);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in gpuArray/internal_attentionBackward (line 34)

[dQ, dK, dV] = nnet.internal.cnnhost.scaledDotProductAttentionBackward(dZ, Q, K, V, ...

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.recording.operations.AttentionOp/backward (line 48)

[dQ,dK,dV] = internal_attentionBackward(dZ,Q,K,V,dataForBackward,M,op.Args{1:6});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.recording.RecordingArray/backwardPass (line 99)

grad = backwardTape(tm,{y},{initialAdjoint},x,retainData,false,0);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in dlarray/dlgradient (line 132)

[grad,isTracedGrad] = backwardPass(y,xc,pvpairs{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in EHSAPressureStatePrediction>modelLoss (line 223)

gradients = dlgradient(loss,net.Learnables);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.dlfeval (line 17)

[varargout{1:nargout}] = fun(x{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.dlfevalWithNestingCheck (line 19)

[varargout{1:nargout}] = deep.internal.dlfeval(fun,varargin{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in dlfeval (line 31)

[varargout{1:nargout}] = deep.internal.dlfevalWithNestingCheck(fun,varargin{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The dlnetwork:

numIn = 5; 
numOut = 2; 
peDim = 6; 
seqLen = 4096;
layers = [sequenceInputLayer(numIn,"Normalization","none","Name","Input","MinLength",seqLen)
          positionEmbeddingLayer(peDim,seqLen)
          selfAttentionLayer(5,10,"DropoutProbability",0.2)
          convolution1dLayer(3,20,"DilationFactor",2,"Padding","causal")
          layerNormalizationLayer
          convolution1dLayer(5,25,"DilationFactor",4,"Padding","causal")
          layerNormalizationLayer
          convolution1dLayer(7,30,"DilationFactor",8,"Padding","causal")
          fullyConnectedLayer(20)
          reluLayer
          fullyConnectedLayer(10)
          reluLayer
          fullyConnectedLayer(5)
          reluLayer
          fullyConnectedLayer(numOut,"Name","output")];
net = dlnetwork(layers);
% analyzeNetwork(net);

I want to know why the dropout in the selfAttentionLayer causes this error.

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Ritam il 14 Apr 2026

Apri in MATLAB Online

0 voti

I was able to run the provided "dlnetwork" model code without encountering any errors. Based on this, the issue does not appear to be inherently related to the "selfAttentionLayer". Instead, it is likely due to limitations in the available GPU memory on your system.

As potential workarounds, you may consider the following options:

Use a less memory‑intensive data type, such as single, instead of double precision.
Train the network using mini‑batches. Feedforward networks do not natively support mini‑batch training, so this needs to be implemented manually.

To implement manual mini‑batch training, you can split your dataset into smaller subsets (for example, x{i} and t{i} for inputs and targets). Then, set the number of training epochs to 1 within the training function and use nested loops—one for epochs and another for iterations. A simplified example is shown below:

net = feedforwardnet(10);
net.trainFcn = 'trainscg';
net.trainParam.epochs = 1;
for e = 1:nEpochs
    for i = 1:nIterations
        net = train(net, x{i}, t{i});
    end
end

Please also ensure that batches are loaded into memory only at the time of training and not all at once.

If the issue persists after applying these changes, I would recommend reaching out to MathWorks Technical Support for further assistance specific to your setup at https://in.mathworks.com/company/aboutus/contact_us.html

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Chuguang Pan il 14 Apr 2026

@Ritam. Thanks for your answer, the issue is caused by limited memory.

Accedi per commentare.

Training error when use selfAttentionLayer with DropOut

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

Training error when use selfAttentionLayer with DropOut

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti