Time series in Artificial neural network (ANN) example pollution Mortality

Question

0 voti

[X,T] = pollution_dataset

net = timedelaynet(1:2,10);

[Xs,Xi,Ai,Ts] = preparets(net,X,T);

net = train(net,Xs,Ts,Xi,Ai);

I used this very simple example code to get a feel; how a time series network predict a future value. So, what I did instead of using 508 dataset for inputs and outputs. I reduced it to 502 dataset and then tried to get 503rd target by using 503rd input vector.

My simple code was

>> d={[80.3800000000000;57.3100000000000;4.34000000000000;1.57000000000000;9.73000000000000;40.0100000000000;6.99000000000000;47.2200000000000]};

>> a=net(d);

This gave me the following results

7230
0219
1554

While actual output should be

149.220000000000

7.88000000000000

73.4600000000000

Why this is giving so deviated values?....Any Explanation Please!

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Shashank Prasanna il 29 Lug 2013

0 voti

Here is an example I shared sometime earlier. The example predicts 30 steps of a sine wave:

http://www.mathworks.com/matlabcentral/answers/60854#answer_73201

You may use it as a template but that does not mean your network will predict the exact values. You will have to play with the different options before you get your ideal result.

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

sandeep il 30 Lug 2013

Apri in MATLAB Online

Thank You very much sir!

I used your code

It is predicting.

But my Prediction is very bad.

First problem is my "Performance is always around e+5 like for this following code it is 840671.2663 at epoch 5"

and of course 2nd problem is prediction are far away from the results it should be.

What Could I do? Please help

I am writing my code here

My problem is very similar to example 'pollution Mortality' having 4 dimensional input and 1 dimensional outputs (I can take more upto 6 dimensional inputs and 3 dimensional outputs, I have the data for it)

    filename = 'Inputs_2013.xlsx';     %having data of 351 days
    u= xlsread(filename);
     filename = 'Outputs_2013.xlsx';  % having datas of 351 days.
     y= xlsread(filename);
     x = tonndata(u,true,false);
     t = tonndata(y,true,false);
     inputSeries = x(1:320);
     xnh=x(321:end);
     targetSeries = t(1:320);
     toPredict = t(321 : end);
     inputDelays = 1:27;
    feedbackDelays = 1:27;
    hiddenLayerSize = [25 25];
    net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
    [inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,  {},targetSeries);
    net.divideFcn = 'divideint';  %I tried with dividerand, divideblock,  divideint, no one is giving correct results
    net.divideParam.trainRatio = 70/100;
    net.divideParam.valRatio = 15/100;
    net.divideParam.testRatio = 15/100;
    [net,tr] = train(net,inputs,targets,inputStates,layerStates);
    outputs = net(inputs,inputStates,layerStates);
    errors = gsubtract(targets,outputs);
     performance = perform(net,targets,outputs)
     netc = closeloop(net);
     netc.name = [net.name ' - Closed Loop'];
     NumberOfPredictions = 31
     newInputSeries = xnh(1:NumberOfPredictions);
    newInputSeries = [inputSeries(end-27:end), newInputSeries]
    newTargetSet = nan(size(newInputSeries))                     
    newTargetSet = num2cell(newTargetSet )                        
    newTargetSet (1:27) = targetSeries(end-26:end)                 
    [xc,xic,aic,tc] = preparets(netc,newInputSeries,{},newTargetSet);   
    yPredicted = sim(netc,xc,xic,aic)

%Based on this yPredicted, I matched with my data. for example for example when gas production should be 3500 m3 it is predicting 5558 m3 likewise.

What should I do? Is it ANN can not fit these data?

Thanking you!

Sandeep

sandeep il 31 Lug 2013

Sir, I am using narxnet because I have data of 351 days of a company for biogas generation. What I understand that since these are data of different dates. So it is a time series and hence I used timedelay network.

Sir, my main problem is that my training set error is decreasing, but validation set error and test set error stopped decreasing after 2 to 5 iteration. Hence my validation performance is always in 6 digit like 573290.71 or 732357.83 etc.

And since test set performance is of this amount then How can I expect to get a good prediction.

I am playing with different numbers of neurons, layers, divideFcn, training algorithm and so on. But performance always remain in 6 digit!!!

What could I do?

sandeep il 1 Ago 2013

Apri in MATLAB Online

Shashank Sir, How to modify your code When I have 3 dimensional outputs.

newTargetSet = nan(size(newInputSeries))

    newTargetSet = num2cell(newTargetSet )
    newTargetSet (1:10) = targetSeries(end-9:end)
    [xc,xic,aic,tc] = preparets(netc,newInputSeries,{},newTargetSet);
    yPredicted = sim(netc,xc,xic,aic)

I tried but it gives an error Feedback{1,11} and Feedback{1,1} have different numbers of rows.

I want to get my first element of output and take error.

I wrote code for calculating errors % for 1 dimensional output.

errors1 = gsubtract(yPredicted(1:end-1),toPredict);

errors4=cell2mat(errors1);

errors5=(abs(errors4)./y(321 : end))*100;

How Can I write this for 3 dimensional output. I want %error for output element 1.

Thanking You!

Sandeep

Accedi per commentare.

Answer 2

Greg Heath il 28 Lug 2013

Modificato: Greg Heath il 29 Lug 2013

0 voti

1. You should not use the default divide function DIVIDERAND in a timeseries. Although I recommend DIVIDEBLOCK, DIVIDEIND or DIVIDEINT could also be used.

2. Normalize the target series to have zero mean and unit variance (help zscore)

3. Initialize the RNG before creating the net

4. [ net tr Y Xf Af ] = train(net,Xs,Ts,Xi,Ai);

5. Make sure the design is good

tr = tr

6. If not make multiple designs until you find a good one

7. Make sure the delay buffer is loaded when you run the new data.

Hope this helps.

Thank you for formally accepting my answer

Greg

P.S. Search timedelaynet greg

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

sandeep il 31 Lug 2013

I am already using this command for training.

[net,tr] = train(net,inputs,targets,inputStates,layerStates);

which gives tr as a output.

But I don't know how to exploit tr.

sandeep il 31 Lug 2013

Sir, I have also a problem with your 3rd point about RNG. Could you give me an example for this which makes things clear?

Accedi per commentare.

Answer 3

Greg Heath il 31 Lug 2013

Apri in MATLAB Online

0 voti

 rng(0)
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
tr = tr   % No semicolon

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

sandeep il 1 Ago 2013

Apri in MATLAB Online

Sir, actually you could not get my point, My question was how to use tr?

For e.g I am getting this.

tr =

        trainFcn: 'trainlm'
      trainParam: [1x1 nnetParam]
      performFcn: 'mse'
    performParam: [1x1 nnetParam]
        derivFcn: 'defaultderiv'
       divideFcn: 'divideint'
      divideMode: 'time'
     divideParam: [1x1 nnetParam]
        trainInd: [1x209 double]
          valInd: [1x45 double]
         testInd: [1x45 double]
            stop: 'Validation stop.'
      num_epochs: 8
       trainMask: {1x299 cell}
         valMask: {1x299 cell}
        testMask: {1x299 cell}
      best_epoch: 2
            goal: 1.0000e-20
          states: {'epoch'  'time'  'perf'  'vperf'  'tperf'  'mu'  'gradient'  'val_fail'}
           epoch: [0 1 2 3 4 5 6 7 8]
            time: [0.5190 1.0380 1.5020 2.4090 3.1940 3.9740 4.7480 5.5590 6.3250]
            perf: [3.0733e+06 4.2023e+05 1.1065e+05 1.3808e+04 973.5141 63.4077 0.6065 8.2337e-07 1.4027e-17]
           vperf: [1x9 double]
           tperf: [1x9 double]
              mu: [1x9 double]
        gradient: [2.2074e+07 4.1308e+06 2.1810e+06 6.4409e+05 1.4633e+05 2.6902e+04 3.5897e+03 3.1237 1.2626e-05]
        val_fail: [0 0 0 1 2 3 4 5 6]
       best_perf: 1.1065e+05
      best_vperf: 5.5636e+05
      best_tperf: 4.1737e+05

Now what I suppose to do?

sandeep il 14 Ago 2013

Apri in MATLAB Online

Thanks Greg

Which recommendation I did not follows? for divideblock? I used that one but it did not give good results.

Please explain your point no. 6. " Try 10 different random weight initializations for each value of hidden nodes that you try."

Suppose I get good results after 8 time training. But again when I want to reproduce the results I could not get the same results.

If I use rng(0) then I get constant results after second training.

How could I reproduce my results which is best? rng(0) reproduce the results but it is worse than other results.

Here is my code

filename = 'Inputs_2013_new.xlsx'; %A 4 dimensional input, having data of 351 days

u= xlsread(filename);

filename = 'Outputs_2013_new.xlsx'; % A 1 dimensional output, having datas of 351 days.

y= xlsread(filename);

x = tonndata(u,true,false);

t = tonndata(y,true,false);

inputSeries = x(1:320); %320 days data is taken for NN network training

xnh=x(321:end); %31 days data is used for prediction of outputs

targetSeries = t(1:320);

toPredict = t(321 : end);

inputDelays = 1:21;

feedbackDelays = 1:21;

hiddenLayerSize = [25 15];

net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);

net.layers{1}.transferFcn='logsig';

net.layers{2}.transferFcn='logsig';

net.layers{1}.initFcn='initnw';

net.layers{2}.initFcn='initnw';

net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};

net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};

[inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);

net.divideFcn = 'dividetrain'; %I tried with dividerand, divideblock, divideint, no one is giving correct results

%net.divideParam.trainRatio = 70/100;

%net.divideParam.valRatio = 15/100;

%net.divideParam.testRatio = 15/100;

net.trainFcn = 'trainlm'; %I tried diffrent training algorithm, But could not get better predition.

net.performFcn = 'mse';

net.trainParam.epochs=1000;

net=init(net);

rng(0)

 % Train the Network
 [net,tr] = train(net,inputs,targets,inputStates,layerStates);
 tr=tr;

% Test the Network

outputs = net(inputs,inputStates,layerStates);

errors = gsubtract(outputs, targets);

errors2=cell2mat(errors);

errors3=(abs(errors2)./y(1:2,22:320))*100; %Change here when you change Time Delay

performance = perform(net,targets,outputs)

%Now close the loop

netc = closeloop(net);

netc.name = [net.name ' - Closed Loop'];

NumberOfPredictions = 31;

newInputSeries = xnh(1:NumberOfPredictions);

newInputSeries = [inputSeries(end-20:end), newInputSeries]; %Change here when you change Time Delay

a=size(newInputSeries);

newTargetSet = nan(2,a(:,2)); %set outputs to Nan so that Prediction is based on input values.

newTargetSet = tonndata(newTargetSet,true,false);

newTargetSet (1:21) = targetSeries(end-20:end); %Change here when you change Time Delay

[xc,xic,aic,tc] = preparets(netc,newInputSeries,{},newTargetSet); %a=nan(2, 3)

yPredicted = sim(netc,xc,xic,aic); %It will give you one extra prediction, means it will give 32 prediction)

errors1 = gsubtract(yPredicted(1:end),toPredict);

errors4=cell2mat(errors1);

errors5=(abs(errors4)./y(1:2,321:end))*100;

formean=errors5(1,:);

Greg Heath il 16 Ago 2013

Apri in MATLAB Online

Time series in Artificial neural network (ANN) example pollution Mortality Asked by sandeep on 27 Jul 2013 at 6:33 Comment by Greg Heath on 12 Aug 2013 at 15:00 % 1. net = net will show you all of the net properties % 2. tr = tr will show you the training record % 3. tr.divideint, tr.trainInd ,etc indicate that you do not have % uniform spacing between your points. Why didn't you follow my % recommendation? % 4. Why don't you use dividetrain first and concentrate on mini- % mizing the No. of hidden neurons you need to get a good design. % 5. goal = 1e-20 is unreasonable. Try 0.01*mean(var(target',1)) to % get a normalized MSE of 0.01 % 6. Try 10 different random weight initializations for each value of % hidden nodes that you try. % 7. If you search on rng(0) timedelaynet you will probably find some % helpful code % 8. Also search on h = Hmin

Response by sandeep on 14 Jul 2013 ~9:30 % Which recommendation I did not follows? for divideblock? I used % that one but it did not give good results.

   Divideblock is not the reason for the failure. However, If you try 
   another option, you MUST maintain the same uniform time spacing 
   for trn/val and tst.

% Please explain your point no. 6. " Try 10 different random weight % initializations for each value of hidden nodes that you try."

   Assumes prob(successful random weight initialization) >= 10%
   I have many posts as examples. Search on i = 1:Ntrials

% Suppose I get good results after 8 time training. But again when % I want to reproduce the results I could not get the same results. % If I use rng(0) then I get constant results after second training. % How could I reproduce my results which is best? rng(0) % reproduce the results but it is worse than other results.

   1. You can record the initial state of the RNG before each 
   weight initialization, 
   help rng 
   rng(1492)
   Ntrials= 5
   for i = 1:Ntrials
       state(i) = rng
       output(i) = rand
   end
   % To recalculate output(3)
   rng( state(3))
   y = rand
   error = output(3)-y %0
   2. Or, within a loop, store the net that has the current best 
   performance on the validation data.

% Here is my code % %A 4 dimensional input, having data of 351 days % filename = 'Inputs_2013_new.xlsx'; % u= xlsread(filename); % % A 1 dimensional output, having datas of 351 days. % filename = 'Outputs_2013_new.xlsx';

    I'm confused: For pollution_dataset 
    size(input) = [8 508] 
    size(output) = [ 3 508]
    Which ones are you using?

% y= xlsread(filename); % x = tonndata(u,true,false); % t = tonndata(y,true,false); % inputSeries = x(1:320); %320 days for training % xnh=x(321:end); %31 days for prediction of outputs % targetSeries = t(1:320); % toPredict = t(321 : end);

BASIC ERROR: Input is 4-D but you are using a 1-D notation!

% inputDelays = 1:21; % feedbackDelays = 1:21;

Why??

% hiddenLayerSize = [25 15];

Why TWO hidden??? Why these values?

% net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize); % net.layers{1}.transferFcn='logsig'; % net.layers{2}.transferFcn='logsig';

Innappropriate for mapminmax

% net.layers{1}.initFcn='initnw'; % net.layers{2}.initFcn='initnw'; % net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'}; % net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};

DELETE previous 4. They are defaults

% [inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries); % net.divideFcn = 'dividetrain'; %I tried with dividerand, divideblock, divideint, % no one is giving correct results

   Dividetrain useful for determining delays and hidden nodes. Then 
   retrain with divideblock to get unbiased performance estimates from
   nontraining test set

%net.divideParam.trainRatio = 70/100; %net.divideParam.valRatio = 15/100; %net.divideParam.testRatio = 15/100;

DELETE. Incompatible with dividetrain

% net.trainFcn = 'trainlm'; %I tried diffrent training algorithm, But could % not get better predition. % net.performFcn = 'mse'; % net.trainParam.epochs=1000; % net=init(net);

DELETE. They are defaults.

% Train the Network % rng(0) % [net,tr] = train(net,inputs,targets,inputStates,layerStates); % tr=tr;

Remove semicolon to see the detailed training record

% Test the Network % outputs = net(inputs,inputStates,layerStates); % errors = gsubtract(outputs, targets); % errors2=cell2mat(errors); % errors3=(abs(errors2)./y(1:2,22:320))*100; %Change here when you change Time Delay % performance = perform(net,targets,outputs)

DELETE. Obtain performance from tr.

%Now close the loop

Worry about that later.

Hope this helps.

Greg

sandeep il 31 Ago 2013

Apri in MATLAB Online

Sir,

1.Suppose I wrote my 4-D inputs as the following cell array

g={[1;2;3;4] [4;5;6;7] [8;9;3;4] [1;3;5;8] [2;4;5;6] [4;6;7;9] [1;3;5;6]}

in that case g(2,1:3) or g([1,3],1:3) gives 'Index exceeds matrix dimensions.'

my earlier code is working without any difficulty. Are you still suggests there is a problem?

3. What would be if I manually normalized all of my data by their corresponding maximums. so that my inputs and outputs excel files contain data of range [0,1]? Is it OK?

4. Sir, What should be the criterion of deciding which network is good for me. whether it should be Validation performance or test performance of training performance of overall performance or training R, test R, Validation R or overall R? What should be my criterion?......It seems that there are so many variable based on that we can decide.

 5. You can record the initial state of the RNG before each
   weight initialization, 
   help rng 
   rng(1492)
   Ntrials= 5
   for i = 1:Ntrials
       state(i) = rng
       output(i) = rand
   end
   % To recalculate output(3)
   rng( state(3))
   y = rand
   error = output(3)-y %0
This code is working very well, But when I restart the computer I am not able to reproduce the results anymore. It reproduce the results till that I am on my workspace.

Thanks again for your suggestion and giving time.

Greg Heath il 1 Set 2013

Apri in MATLAB Online

Sir,

1.Suppose I wrote my 4-D inputs as the following cell array

g={[1;2;3;4] [4;5;6;7] [8;9;3;4] [1;3;5;8] [2;4;5;6] [4;6;7;9] [1;3;5;6]}

in that case g(2,1:3) or g([1,3],1:3) gives 'Index exceeds matrix dimensions.'

my earlier code is working without any difficulty. Are you still suggests there is a problem?

% Sorry, my comments were for matrices, not cells.

3. What would be if I manually normalized all of my data by their corresponding maximums. so that my inputs and outputs excel files contain data of range [0,1]? Is it OK?

 % Only consider MAPMIMAX or MAPSTD for inputs and use TANSIG in all hidden  
layers
 % Also, only consider MAPMIMAX or MAPSTD for real-valued regression and 
curve-fitting outputs (, e.g., FITNET) and use PURELIN or TANSIG in the 
output layer
 % However, for c-class/category classification and pattern recognition outputs 
( e.g., PATTERNNET), consider binary unit column vectors from eye(c). Since 
the outputs sum to unity, it is only necessary to use c-1 of the outputs. 
Typically, however, this only done for c==2.

4. Sir, What should be the criterion of deciding which network is good for me. whether it should be Validation performance or test performance of training performance of overall performance or training R, test R, Validation R or overall R? What should be my criterion?......It seems that there are so many variable based on that we can decide.

 % For huge data sets where the number of training equations is much larger 
than the number of unknown weights (Ntrneq >> Nw), I tend to use 
'dividetrain' and the degree-of-freedom adjusted MSE, NMSE or R2 (R^2 ) 
which I denote by MSEa, NMSEa and R2a.
 % For smaller data sets I rely more on validation set performance using 
'dividerand' for static nets and 'divideblock' for dynamic nets.
 % To mitigate unfortunate designations of random initial weights, I tend to 
always generate 10 designs for each set of input parameters.
 % Test sets are only to be used for the estimation of nondesign set (AKA 
 generalization ) performance.
 % You can search my NEWSGROUP and ANSWERS posts using the above notation. (e.g., greg MSEa or greg R2a)
 5. You can record the initial state of the RNG before each weight
 initialization,
   help rng 
   rng(1492)
   Ntrials= 5
   for i = 1:Ntrials
       state(i) = rng
       output(i) = rand
   end
   % To recalculate output(3)
   rng( state(3))
   y = rand
   error = output(3)-y %0

This code is working very well, But when I restart the computer I am not able to reproduce the results anymore. It reproduce the results till that I am on my workspace.Thanks again for your suggestion and giving time.

% Strange ... I have never encountered that.

sandeep il 2 Set 2013

Sir, For a network I usually train the network 25 times using Ntrials=25 into your code.

Now during training I got different results I am giving you 2 instances.

In one case I get the following results (during 23rd training state):

Training Performance: 0.0229

Validation Performance: 0.0256

Test Performance: 0.0143

Overall Performance: 0.0220

R: 0.5621

Now when I used this network for prediction. I got the following error%

For one day prediction: 4.9164% For 7 days Prediction: 4.9164, 3.2588, 4.1001, 1.6657, 2.9649, 4.7078, 1.8778

Now in 2nd case when I used instead of 3 dimensional inputs, I remove 1 variable thinking that they are correlated and used 2 dimensional inputs (Don't confuse with my earlier comments , I am trying with different data set)

I got the following results (during 5th training state)

Training Performance: 0.0140

Validation Performance: 0.0177

Test Performance: 0.0086

Overall Performance: 0.0138

R: 0.7044

Now when I used this network for prediction. I got the following error%

For one day prediction: 9.7742% For 7 days Prediction: 9.7742, 11.4388, 16.6848, 19.5748, 26.008, 39.6707, 39.3478

Now you can see all the parameter is better in 2nd case than first case, but still prediction is good in first case.

Then How to decide which parameter is deciding factor?......Of course in actual situation I will not have the output to check the error. Please help

I used 551 data for training (including test and validation set) and delays=5 days for both input and feedback. and used 2 hidden layer each having 12 neurons. and I used divideblock as per your suggestion.

Please help!...again thanking you for your support.

Greg Heath il 3 Set 2013

You are giving your interpretation of what you did and comparisons. It is too confusing without seeing your code. Please stay with the MATLAB dataset.

Shashank Prasanna il 3 Set 2013

sandeep, if you have a new or related question please create/ask a new question. This will (1) give your question more visibility and (2) allow you to give credit to who ever answers your question.

Accedi per commentare.

Time series in Artificial neural network (ANN) example pollution Mortality

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Più risposte (2)

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

Categorie

Prodotti

Tag

Community Treasure Hunt

Time series in Artificial neural network (ANN) example pollution Mortality

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

4 Commenti Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Più risposte (2)

4 Commenti Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

12 Commenti Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

Categorie

Prodotti

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti