How do I obtain 8760 mean hourly data points from 10.6 years data?

Question

Chish il 2 Ott 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2028129-how-do-i-obtain-8760-mean-hourly-data-points-from-10-6-years-data

Commentato: Chish il 5 Ott 2023

XWinddata.xlsx

I have data from 01-Jan-2013 to 23- Aug-2023. I have attached the XWinddata.xlsx file. The data comprise hourly measured wind speed, direction, temperature and pressure. I want to find hourly average values for the period so that I have 8760 hourly data corresponding to hours in a year. I have cleaned the data to remove NANs. I failed to obtain the required values. I have followed two similar questions that I came accross on this site but was getting errors. Please may you assist.

https://www.mathworks.com/matlabcentral/answers/455537-trouble-using-retime-to-obtain-1-year-of-hourly-average-values-from-15-years-of-hourly-average-value.

Following the codes given I created my code

XWinddata.DoY = day(XWinddata.Timestamp,'dayofyear'); % I follwed the answer in the above link, it worked
XWinddata.HoD = hour(XWinddata.Timestamp); % This also worked
head(XWinddata) % Also worked and the DoY and HoD columns were added to my timetable
t = varfun(@mean,XWinddata,'GroupingVariables',{'DoY' 'HoD'},'OutputFormat','table'); % I got an error and got stuck

Error obtained while running the code

Error in tabular/varfun>@(s,varargin)dfltErrHandler(grouped,funName,s,varargin{:}) (line 201)

errHandler = @(s,varargin) dfltErrHandler(grouped,funName,s,varargin{:});

Error in tabular/varfun (line 280)

outVals{igrp} = errHandler(s,inArg);

I also used groupsummary following answeras given on https://www.mathworks.com/matlabcentral/answers/566793-how-to-obtain-hourly-average-from-several-years-of-data?s_tid=sug_su

I also obtained errors.

Thank you in anticipation.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

dpb il 2 Ott 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2028129-how-do-i-obtain-8760-mean-hourly-data-points-from-10-6-years-data#answer_1323489

Modificato: dpb il 2 Ott 2023

Apri in MATLAB Online

XWinddata.xlsx

tt=readtimetable('XWinddata.xlsx');
head(tt,1)
         Timestamp          Velocityms    DirectionDegrees    TemperatureDegrees    PressurePa      yearhour      DoY
    ____________________    __________    ________________    __________________    __________    ____________    ___

    11-Jan-2013 04:00:00       5.1              180                  16.9           0.0093189     {'01/11 04'}    11 
tt=removevars(tt,{'yearhour','DoY'});                           % clean so start over
tt=addvars(tt,day(tt.Timestamp,'dayofyear'),hour(tt.Timestamp),'NewVariableNames',{'DoY','HoD'},'After',{'PressurePa'});
ttHr=varfun(@mean,tt,'GroupingVariables',{'DoY','HoD'},'OutputFormat','table');
height(ttHr)
ans = 4486
[head(ttHr);tail(ttHr)]
ans = 16×7 table
    DoY    HoD    GroupCount    mean_Velocityms    mean_DirectionDegrees    mean_TemperatureDegrees    mean_PressurePa
    ___    ___    __________    _______________    _____________________    _______________________    _______________

      1     0         20             4.705                 135.5                     21.66                 0.009343   
      1     2         20             5.255                 128.5                     21.25                0.0093378   
      1     4         20              5.68                 125.5                     20.96                0.0093348   
      1     6         20             5.705                 117.5                     20.64                0.0093323   
      1     8         20              5.64                 116.5                    20.265                0.0093322   
      1    10         20              5.67                   130                      20.1                 0.009336   
      1    12         20             5.595                 126.5                     20.15                0.0093402   
      1    14         20               5.5                   125                    20.255                0.0093432   
    366     8          4             4.325                 127.5                      21.4                  0.00933   
    366    10          4             3.725                   110                     21.45                0.0093339   
    366    12          4               3.3                 107.5                    20.875                0.0093391   
    366    14          4             3.125                 147.5                     21.05                0.0093391   
    366    16          4             2.725                  92.5                    21.525                0.0093403   
    366    18          4               3.5                   115                      21.7                0.0093416   
    366    20          4             3.475                   100                        22                0.0093374   
    366    22          4             4.075                    80                    22.125                0.0093302   

But, you are missing most of the odd hours...

[numel(unique(tt.DoY)) numel(unique(tt.HoD)) numel(unique([tt.DoY tt.HoD],'rows'))/2 366*24]
ans = 1×4
         366          24        4486        8784

As the above confirms, there are only 4,486 unique DoY, HoD combinations, not the 8784 that exist in a 366-day year. That's because you have data for most days only on two-hour increments.

This also leads back to the problem that the poster in one of the earlier links had in that there are leap years and they will show up with years that do have 366 days in them and the DOY algorithm will always count them in its conversion, even if you were to remove Feb 29 data from the timetable before computing DoY.

You didn't show us the top level user code error, only the internal error trap so we can't tell for absolute certain what went wrong; I'd guess the problem was you didn't remove the 'yearhour' variable (or use 'InputVariables' named argument pair) to not try to average it as well -- being nonnumeric, you can't take the mean of it.

NOTA BENE: However, it appears as you have only a very limited number of observations for the odd hours; whether to try to interpolate to fill those first or not I'll leave to you to decide...

Says there almost two entries for each time stamp, however, and interpolation only works for unique x values so will have to average first and then fill in..

tt=removevars(tt,{'DoY','HoD'});                % clean to start over
tt=rmmissing(retime(tt,'hourly','mean'));       % average remove missing hours
tt=retime(tt,'hourly','linear');                % now fill in odd hours
tt=addvars(tt,day(tt.Timestamp,'dayofyear'),hour(tt.Timestamp),'NewVariableNames',{'DoY','HoD'},'After',{'PressurePa'});
[numel(unique(tt.DoY)) numel(unique(tt.HoD))]
ans = 1×2
   366    24
isleapyr=@(t)(eomday(year(t),2)==29);           % leap year logical function
isleapday=@(t)((month(t)==2)&(day(t)==29));     % leap day logical function
tt(isleapday(tt.Properties.RowTimes),:)=[];     % now remove Feb29 globally
[numel(unique(tt.DoY)) numel(unique(tt.HoD))]   % still 1:366 in DoY
ans = 1×2
   366    24
ix=isleapyr(tt.Properties.RowTimes)&(month(tt.Properties.RowTimes)>2);  % dates needing fixup
tt.DoY(ix)=tt.DoY(ix)-1;                        % and adjust to 365 days
[numel(unique(tt.DoY)) numel(unique(tt.HoD))]   % now we're back to 365 day year w/ DoY to match
ans = 1×2
   365    24
ttHr=varfun(@mean,tt,'GroupingVariables',{'DoY','HoD'},'OutputFormat','table');
height(ttHr)
ans = 8760
[head(ttHr);tail(ttHr)]
ans = 16×7 table
    DoY    HoD    GroupCount    mean_Velocityms    mean_DirectionDegrees    mean_TemperatureDegrees    mean_PressurePa
    ___    ___    __________    _______________    _____________________    _______________________    _______________

      1     0         10             4.705                 135.5                     21.66                 0.009343   
      1     1         10              4.98                   132                    21.455                0.0093404   
      1     2         10             5.255                 128.5                     21.25                0.0093378   
      1     3         10            5.4675                   127                    21.105                0.0093363   
      1     4         10              5.68                 125.5                     20.96                0.0093348   
      1     5         10            5.6925                 121.5                      20.8                0.0093336   
      1     6         10             5.705                 117.5                     20.64                0.0093323   
      1     7         10            5.6725                   117                    20.452                0.0093323   
    365    16         10             3.615                   114                     20.23                0.0093648   
    365    17         10            3.7075                   119                    20.355                0.0093652   
    365    18         10               3.8                   124                     20.48                0.0093655   
    365    19         10            3.8225                124.75                     20.63                0.0093645   
    365    20         10             3.845                 125.5                     20.78                0.0093636   
    365    21         10                 4                 126.5                    20.935                0.0093613   
    365    22         10             4.155                 127.5                     21.09                0.0093589   
    365    23         10              4.43                 131.5                    21.375                 0.009351   

The above does it by just throwing away Feb 29, you could alternatively perhaps collapse Feb 28-29 to Feb 28 or Feb 29/Mar 1 to Mar 1 to somehow keep the values observed in the mix, but the above is the simplest approach, certainly.

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

dpb il 2 Ott 2023

Modificato: dpb il 2 Ott 2023

OK, I misread intent apparently.

See above amended answer regarding the number of values you get; you don't have the data for most of the odd hours, that would take interpolating.

As for 365 vs 366 days in a year; it would take adjusting the DoY values to remove 366 one way or another; the 366th day is Dec 31 if you just delete those elements; March 1 is day 60 in non leapyears but 61 in leap years so to keep calendar dates the same would mean moving the DoY down one for leap years.

Alternatively, if one were to eliminate Feb 29 initially, one could manually create the DoY excluding leap years instead of using the builtin day() function to do so and then get the 8760 number.

For the user error, there should have been more red text of the error when it failed before from the actual code line that failed--that it worked as I did it by eliminating the string day/hour field before computing the mean is pretty conclusive that your not having done that was the problem.

dpb il 2 Ott 2023

Modificato: dpb il 2 Ott 2023

Apri in MATLAB Online

If have missing data, you'll either have to interpolate or eliminate it -- you have options on import to either remove rows that you chose or insert missing values by variable/row; then you have options for dealing with those missing values.

Since almost all odd hours were missing in the original, I went ahead and interpolated to fill in -- but it also showed that there were about two entries on average for each timestamp so had to average those down to one value first before could interpolate.

The Q? might be why there are duplicate time data and no odd hours; is that a fignewton from the collection process, maybe, that the odd hours got rounded to the even so there are two values at the same time in the data file but they're supposed to have been the odd-even hours instead? I'll leave that conundrum to you to figure out; I presume you know how the data were collected to be able to sort that far easier than I can.

But, if that is real but there are hourly data but just not complete to fill in the odd hours, then there's always the option to use the 'nanflag' in the mean function that just ignores NaN values; one would then get a number as long as there were any values for each date/time for each variable. To do tht would require using an anonymous function in place of the text or simple function handle to pass the auxiliary argument -- something akin to

ttHr=varfun(@(x)mean(x,'omitnan'),tt,'GroupingVariables',{'DoY','HoD'},'OutputFormat','table');

dpb il 3 Ott 2023

Modificato: dpb il 3 Ott 2023

Apri in MATLAB Online

XWinddata.xlsx

readcell('XWinddata.xlsx')
ans = 93242×6 cell array
    {'Date'                    }    {'Time'                    }    {'Velocity(m/s)'}    {'Direction(Degrees)'}    {'Temperature(Degrees)'}    {'Pressure(Pa)'}
    {[01-Jan-2013 00:00:00.000]}    {[01-Jan-2018 00:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 01:00:00.000]}    {[01-Jan-2018 01:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 02:00:00.000]}    {[01-Jan-2018 02:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 03:00:00.000]}    {[01-Jan-2018 03:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 04:00:00.000]}    {[01-Jan-2018 04:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 05:00:00.000]}    {[01-Jan-2018 05:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 06:00:00.000]}    {[01-Jan-2018 06:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 07:00:00.000]}    {[01-Jan-2018 07:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 08:00:00.000]}    {[01-Jan-2018 08:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 09:00:00.000]}    {[01-Jan-2018 09:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 10:00:00.000]}    {[01-Jan-2018 10:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 11:00:00.000]}    {[01-Jan-2018 11:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 12:00:00.000]}    {[01-Jan-2018 12:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 13:00:00.000]}    {[01-Jan-2018 13:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }
    {[01-Jan-2013 14:00:00.000]}    {[01-Jan-2018 14:00:00.000]}    {'-'            }    {'-'                 }    {'-'                   }    {'-'           }

Well, it seems to have a Date and a Time column header with inconsistent date data in the beginning....I did download and open the file in Excel and indeed, the Date column does include actual time in the cell, not just the date and the Time column is a full date-time string just formatted to look like time only.

I'd suggest if the spreadsheet was created from some other form of the initial input it that process be fixed to not mix apples and oranges although Excel and dates can be an issue in convincing it to contain what you desire, specifically, it tends to have "mind of its own".

Or, when reading, just ignore the Time column entirely.

If do revise, it would also make more sense in the numeric columns with mising data to just leave the cells empty instead of inserting the character string--that's mixing data types that don't go together.

But, that's NOT the same file as the original--I also downloaded and opened it -- it is missing the odd hour entries -- would have to see how it was created to figure out why; perhaps an unintended extra colon ":2" got into an addressing vector and saved only alternate rows, impossible to tell how it got to be what it is without the session operations. But, saving the two with the same file name makes it much easier to be able to confuse "who's who in the zoo!".

dpb il 4 Ott 2023

Moral: Have to be VERY careful with Excel and dates...<for many reasons>. See @Walter Roberson's comment to the Q? there for another nasty trait.

Chish il 5 Ott 2023

The site is very useful. Thanks once again

Accedi per commentare.

How do I obtain 8760 mean hourly data points from 10.6 years data?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

How do I obtain 8760 mean hourly data points from 10.6 years data?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

8 Commenti Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti