Different RMSE for self-written function and experiment manager output

1 visualizzazione (ultimi 30 giorni)
I'm working on a time-series task with a 7x1 Cell Array Input of Sequences, each 6xT large (T is different for each sequence).
I trained with a minibatchsize of 1. Predictors and Responses are already normalized (zscore) before the experiment.
I took the network with the best validation RMSE (0.7935) and applied my own error function to the normalized data to check if i would get the same result:
function [rmse,rmse_channel] = rmseCells(CellArray,TargetArray)
C_Mat = [CellArray{:}];
T_Mat = [TargetArray{:}];
error = (C_Mat - T_Mat);
square_error = error.^2;
mean_square_error = mean(square_error,"all");
rmse = sqrt(mean_square_error);
mean_channel_error = mean(square_error,2);
rmse_channel = sqrt(mean_channel_error);
end
Calling the function with my normalized Data:
[normalizedError.rmseTrain,normalizedError.rmseTrain_channel] = rmseCells(PNYTrain,NYTrain);
Training Data:
rmse = 0.1469;
rmse_channel = [0.1445 ; 0.1187; 0.1290; 0.1295; 0.1765; 0.1905; 0.1231];
mean(rmse_channel) = 0.1446;
Validation Data:
rmse = 0.3564;
rmse_channel = [0.2745; 0.2833; 0.3476; 0.3659; 0.4577; 0.4749; 0.2094];
mean(rmse_channel) = 0.3448;
Small differences like between the rmse and the mean(rmse_channel) can be explained by how they are calculated differently, and i think they are inconsequential as long as i am consistent. Any help to find out how to explain the difference between my calculated values and the experiment manager values would be appreciated.

Risposte (2)

Ayush
Ayush il 2 Gen 2024
I understand that you finding difference between your calculated values and the experiment manager values. You may try these workarounds to fix the issue:
  • Data Concatenation: The function assumes that all matrices within the cell arrays can be concatenated directly. This is only possible if each cell in CellArray and TargetArray contains a matrix with the same number of rows (6 in your case). If the number of columns (T) varies, MATLAB will not allow the concatenation to proceed without padding, as it requires matrices to have the same dimensions for concatenation
  • NaN Handling: If you have sequences of different lengths and you're padding them to concatenate, ensure that the padding does not affect the RMSE calculation. Padding with NaNs and using 'omitnan' in the mean function can help here
  • Data Alignment: Ensure that the predictions and targets are correctly aligned in each cell before concatenation. Any misalignment could lead to incorrect RMSE values.
Here is the conceptual code for that:
function [rmse,rmse_channel] = rmseCells(CellArray,TargetArray)
% Initialize variables for padded matrices
C_Mat_Padded = [];
T_Mat_Padded = [];
% Pad each sequence with NaNs to the same length and concatenate
for i = 1:numel(CellArray)
C_seq = CellArray{i};
T_seq = TargetArray{i};
seqLenDiff = size(C_seq, 2) - size(T_seq, 2);
% Pad the shorter sequence with NaNs
if seqLenDiff > 0
T_seq = [T_seq, NaN(size(T_seq, 1), seqLenDiff)];
elseif seqLenDiff < 0
C_seq = [C_seq, NaN(size(C_seq, 1), -seqLenDiff)];
end
% Concatenate padded sequences
C_Mat_Padded = [C_Mat_Padded, C_seq];
T_Mat_Padded = [T_Mat_Padded, T_seq];
end
% Calculate errors
error = (C_Mat_Padded - T_Mat_Padded);
square_error = error.^2;
% Calculate RMSE, ignoring NaNs
mean_square_error = mean(square_error, 'all', 'omitnan');
rmse = sqrt(mean_square_error);
% Calculate RMSE per channel, ignoring NaNs
mean_channel_error = mean(square_error, 2, 'omitnan');
rmse_channel = sqrt(mean_channel_error);
end
Thanks,
Ayush

Patrick Sontheimer
Patrick Sontheimer il 2 Gen 2024
Modificato: Patrick Sontheimer il 2 Gen 2024
Hey @Ayush,
wow, what a quick reply. Thank you for your contribution.
Feature size is 6 in all examples. The sequences have the different length so i checked if your first point about concatenating is true.
Cell_Array= XTest; % I did try with other parameters like XVal
Sum = 0;
allCell_Array = [Cell_Array{:}];
allSize = size(allCell_Array,2);
for i=1:numel(Cell_Array)
l= size(Cell_Array{i},2);
Sum = Sum+l;
end
isequal(Sum,allSize)
It turns out concatenating my Cell Arrays does not add any padding. Anynan also returns 0, so no nan values are present after concatenating.
My Data alignment doesn't change between the RMSE function call and the experiment, because i load the same preprocessed workspace in both cases.
So it seems i am still on the lookout for further answers, but i've checked some things along the way thanks to your reply.
Best Regards,
Patrick

Categorie

Scopri di più su Data Distribution Plots in Help Center e File Exchange

Prodotti


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by