Low frequency response from LSTM model

Question

Shubham Baisthakur il 6 Nov 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2043762-low-frequency-response-from-lstm-model

Commentato: Shubham Baisthakur il 17 Nov 2023

How to get low frequency output from LSTM network? Following is the time history response of my input features, which has relatively low frequency component

My LSTM network architecture is as follows:

    layers = [
        sequenceInputLayer(size(X_train{1}, 1)) % Input Features (F)
        lstmLayer(x.num_hidden_units_1, 'OutputMode', 'sequence')
        tanhLayer
        dropoutLayer(0.05)
        lstmLayer(x.num_hidden_units_2, 'OutputMode', 'sequence')
        dropoutLayer(0.05)
        tanhLayer
        fullyConnectedLayer(x.num_layers_ffnn)
        tanhLayer
        fullyConnectedLayer(1)
    ];
    

During the training, my network predictions are plotted with the target output as follows

The network output has a very high frequency output on the valildation data, however when the model is used to predict the test data, it is giving a flat line.

The two major concerns for me are:

1) Why the LSTM network is giving high frequency output even when the input features have relatively low frequency?

2) During the training when the model has high frequency, why is it giving a flat line during testing?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Debraj Maji il 17 Nov 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2043762-low-frequency-response-from-lstm-model#answer_1354832

Hi @Shubham Baisthakur

I see that you are trying to understand why your LSTM network is giving a high frequency output on training data even though the input features have a low frequency.

The model might have overfitted to the training data and captured noise or specific patterns that are not generalizable. This can result in high-frequency outputs on the training set, but when applied to unseen data, the model fails to generalize, leading to a flat line.

LSTMs are designed to overcome limitations of traditional RNN based architectures as they can capture long term-dependencies in sequential data. They are not inherently bad for low frequency data, however in your case, the network is unable to capture the underlying pattern in the sequence due to the nature of input features. The high frequency pattern in the output is mainly due to the introduction of noise in the output which in turn is a result of inaccuracies in the prediction.

The possible ways to mitigate this error are:

Increase the amount of training data.
Feature Engineering and feature scaling.
Experiment with different initializations, learning rates, or optimization algorithms to stabilize training. Monitoring training and validation loss curves can provide insights into model stability.
Systematically tune hyperparameters using techniques like grid search or random search to find the most suitable values for your specific problem.

For more information on fine tuning a LSTM you can refer to the following documentation: https://in.mathworks.com/help/deeplearning/ug/long-short-term-memory-networks.html

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Shubham Baisthakur il 17 Nov 2023

Apri in MATLAB Online

Hello @Debraj Maji, I don't think this is possibly a case of overfitting to the training data because even the training data does not have such a high frequency componenet. I have already applied the possible ways you have suggested but couldn't get rid of the high frequency.

I think the error could be in the way I am defining the custom training loop, because the model tend to perform quite well when the model is trained using the "trainnet" function.

I am attaching the code for custom training loop, can you spot any obvious errors?

function [val_loss, net] = LSTM_NetworkOptimization_CustomLoop(x, X_train, Y_train, X_val, Y_val, is_training, isCustomTraining)
    % Define hyperparameters
    if size(X_train,1) > 1
        batchSize = x.batch_size;
    else
        batchSize = 1;
    end
    sequenceLength = x.sequence_length;
    if is_training
        numEpochs = 400; % Adjust as needed
        minEpochs = 300;
    else
        numEpochs = 300;
    end
    % Check if GPU is available
    useGPU = canUseGPU();    
    % Create the LSTM model
    
    layers = [
        sequenceInputLayer(size(X_train{1}, 1)) % Input Features (F)
        lstmLayer(x.num_hidden_units_1, 'OutputMode', 'sequence')
        tanhLayer
        dropoutLayer(x.drop_out_rate*0.5)
        lstmLayer(x.num_hidden_units_2, 'OutputMode', 'sequence')
        dropoutLayer(x.drop_out_rate*0.5)
        tanhLayer
        fullyConnectedLayer(x.num_layers_ffnn)
        tanhLayer
        fullyConnectedLayer(1)
    ];
   
    net = dlnetwork(layers);
    % Define training options
    iteration = 0;
    epoch = 0;
    
    % Adam parameters
    averageGrad = [];
    averageSqGrad = [];    
    
    % Initialize training progress monitor
    if is_training
        monitor = trainingProgressMonitor(Info="Epoch", XLabel="Iteration");
        monitor.Metrics = ["Loss", "Data_Loss", "Var_Loss", "Validation_Loss","Frequency_Loss","Power_Loss","Energy_Loss"];
        groupSubPlot(monitor, "Loss-Components", ["Loss", "Data_Loss", "Var_Loss","Frequency_Loss","Power_Loss","Energy_Loss"]);
        groupSubPlot(monitor, "Validation-Loss", "Validation_Loss");
    end
    
    % Initialize variables for tracking consecutive unchanged validation losses.
    consecutive_unchanged_loss = 0;
    previous_val_loss = inf;
    
    % Network training
    while epoch < numEpochs && consecutive_unchanged_loss < 10
        epoch = epoch + 1;
        % Shuffle data
        idx = randperm(numel(Y_train));
        X_epoch = X_train(idx); % Shuffle both input and output data
        Y_epoch = Y_train(idx);
    
    
        for i = 1:batchSize:numel(Y_train)
            % Prepare mini-batch data
            startIndex = i;
            endIndex = min(i + batchSize - 1, numel(Y_train));
            
            % Pad or truncate sequences to match the specified sequence length
            X_miniBatch = padOrTruncate(X_epoch(startIndex:endIndex), sequenceLength,useGPU);
            Y_miniBatch = padOrTruncate(Y_epoch(startIndex:endIndex), sequenceLength,useGPU);
            Xepoch_val = padOrTruncate(X_val(startIndex:endIndex), length(X_val{1}),useGPU);
            Yepoch_val = padOrTruncate(Y_val(startIndex:endIndex), length(Y_val{1}),useGPU);
    
            X = X_miniBatch;
            T = Y_miniBatch;
        % Evaluate the model loss and gradients using dlfeval and the  modelLoss function.
            isValidation = false();
            [loss,data_loss,var_loss,freq_loss,power_loss,energy_loss,gradients] = dlfeval(@modelLoss_LSTM,net,X,T,isValidation,numEpochs);
    
            % Update the network parameters using the Adam optimizer.
            iteration = iteration + 1;
            [net,averageGrad,averageSqGrad] = adamupdate(net,gradients,averageGrad,averageSqGrad,iteration);
    
            % Compute validation loss
            isValidation = true();
            [val_loss] = dlfeval(@modelLoss_LSTM,net,Xepoch_val,Yepoch_val,isValidation,numEpochs);
    
            % Check if the validation loss has not changed
            if is_training && epoch > minEpochs
                if abs(val_loss - previous_val_loss) < 1e-6
                    consecutive_unchanged_loss = consecutive_unchanged_loss + 1;
                else
                    consecutive_unchanged_loss = 0;
                end
                previous_val_loss = val_loss;
            end
    
            % Update the training progress monitor
            if is_training
                recordMetrics(monitor, iteration, Loss=extractdata(loss), Data_Loss=extractdata(data_loss), Var_Loss=extractdata(var_loss), Validation_Loss=val_loss,Frequency_Loss=freq_loss, Power_Loss=power_loss, Energy_Loss=energy_loss);
                updateInfo(monitor, Epoch=epoch + " of " + numEpochs);
                monitor.Progress = 100 * iteration / (numEpochs * numel(Y_train));
            end
        end
    end
end
% Helper function to pad or truncate sequences
function paddedSequence = padOrTruncate(sequence, targetLength, UseGPU)
    paddedSequence = cell(size(sequence));
    
    for i = 1:numel(sequence)
        if size(sequence{i}, 2) < targetLength
            padding = zeros(size(sequence{i}, 1), targetLength - size(sequence{i}, 2));
            paddedSequence{i} = [sequence{i}, padding];
        elseif size(sequence{i}, 2) > targetLength
            paddedSequence{i} = sequence{i}(:, 1:targetLength);
        else
            paddedSequence{i} = sequence{i};
        end
        paddedSequence{i} = dlarray(paddedSequence{i},"CT");
        if UseGPU
            paddedSequence{i} = gpuArray(paddedSequence{i});
        end
    end
end

Accedi per commentare.

Low frequency response from LSTM model

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Low frequency response from LSTM model

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti