Azzera filtri
Azzera filtri

Error using nnet.inter​nal.cnn.ut​il.Network​DataValida​tor/assert​SequencesH​aveSameNum​berOfObser​vations (line 366) Invalid training data. X and Y must have the same number of observations.

11 visualizzazioni (ultimi 30 giorni)
I am currently working on a classification problem and the code are shown bellow.I have the following errors when I run the codes
Invalid training data. X and Y must have the same number of observations.
clear;
close all;
clc;
filename = "energydata.xls";
data = readtable(filename,'TextType','string');
head(data)
%Remove the rows of the table with empty reports.
%idxEmpty = strlength(data.Appliances) == 0;
%data(idxEmpty,:) = [];
%The goal of this example is to classify events by the label in the event_type column. To divide the data into classes, convert these labels to categorical.
YTrain = categorical(data.lights);
data = xlsread('energydata.xls');
XTrain=data(1:19735,1:1);%input data set
%expectedOutput=data(1:19735,28); %target data set
rng('default');
%XTrain =XTrain';
numObservations = numel(XTrain);
for i=1:numObservations
sequence = XTrain (i);
sequenceLengths(i) = size(sequence,2);
end
%Sort the data by sequence length.
[sequenceLengths,idx] = sort(sequenceLengths);
XTrain = XTrain(idx);
YTrain = YTrain(idx);
XTrain = con2seq(XTrain );
%View the sorted sequence lengths in a bar chart.
figure
bar(sequenceLengths)
ylim([0 30])
xlabel("Sequence")
ylabel("Length")
title("Sorted Data")
inputSize = 19735;
numHiddenUnits = 100;
numClasses = 8;
layers = [ ...
sequenceInputLayer(inputSize)
lstmLayer(numHiddenUnits,'OutputMode','last')
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
maxEpochs = 100;
miniBatchSize = 27;
option = trainingOptions('sgdm','MaxEpochs',100);
net = trainNetwork(XTrain,YTrain,layers,option);

Risposta accettata

Walter Roberson
Walter Roberson il 28 Set 2019
data = readtable(filename,'TextType','string');
YTrain = categorical(data.lights);
No matter whether data.lights was numeric or string object (after the xlsread), we know that YTrain is now categorical. But we do not know its size.
data = xlsread('energydata.xls');
XTrain=data(1:19735,1:1);%input data set
We cannot tell from that whether XTrain is now a numeric vector of size 19735 x 1, or is now a cell array of character vectors of size 19735 x 1, or a 19735 x 1 datetime array or duration array. We do not know whether that is the same size as YTrain.
numObservations = numel(XTrain);
for i=1:numObservations
sequence = XTrain (i);
sequenceLengths(i) = size(sequence,2);
end
Whether XTrain is a numeric vector or cell array of character vectors or datetime array or duration array, XTrain(i) is going to be a 1 x 1 scalar, and size() of that along the second dimension is going to be 1. If XTrain is expected to be a cell array of character vectors, you would use XTrain{i} to get at the content. But you could also consider just using
sequenceLengths = cellfun(@length, XTrain);
with no loop for that situation.
[sequenceLengths,idx] = sort(sequenceLengths);
XTrain = XTrain(idx);
YTrain = YTrain(idx);
The sequence lengths are all 1 because of the above, so the order remains the same. If XTrain is a cell array of character vectors and you make the adjustments I outline above, the order could potentially change. If YTrain was longer than XTrain then some of the elements of YTrain would have been thrown away by the indexing; if YTrain was shorter than XTrain (should not be possible in context) then an error would have been created. So we can be sure that after this, YTrain will be the same size as XTrain.
XTrain = con2seq(XTrain );
XTrain is 19735 x 1 . If it is a numeric column vector then con2seq() of it would produce a 1 x 1 cell array containing a 19735 x 1 entry. If XTrain is a datetime array or duration array or cell array of character vectors, then con2seq() will error out. Because it did not, we deduce that XTrain must be a numeric vector (in which case sorting by its width does not serve any purpose.)
At this point, XTrain is a 1 x 1 cell array containing a 19735 x 1 cell, and YTrain is a 19735 x 1 categorical array.
net = trainNetwork(XTrain,YTrain,layers,option);
trainNetwork has several possible input methods. It appears to me that you are trying to use the "sequences" method, described at https://www.mathworks.com/help/deeplearning/ref/trainnetwork.html#bu6sn4c_sep_mw_5a3519e4-ff13-4b34-baf4-c9f3a716e62d
Sequence or time series data, specified as an N-by-1 cell array of numeric arrays, where N is the number of observations, a numeric array representing a single sequence, or a datastore.
Your cell array is 1 x 1, so you are telling trainNetwork that N=1, that you have one observation.
Sequence-to-label classification: N-by-1 categorical vector of labels, where N is the number of observations.
Your Y is 19735 x 1 categorical array, so it describes 19735 observations. This does not match the 1 observation described by your XTrain.
Perhaps you should be using
XTrain = con2seq(XTrain.').';
to give you a 19735 x 1 cell array of numeric data. But your code that tries to take the length of the sequence suggets to me that you are doing something wrong with your input processing.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by