How to arrange time series data into yearly groups?

Hi,
I have daily temperature time series from 01/03/1961 to 28/07/2018. I want to arrange this data as per water year convention (01 June to 31 May) in following form:
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Temperature Temperature Temperature
Date (1960-1961) Date (1961-1962) . . . . . . . . . . . (2017-2018)
01/06/1960 01/06/1961
. .
. .
. .
. .
. .
31/05/1961 31/05/1962
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Finally, I need this data table in following form
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
= = = = = = = = = = = = Temperature = = = = = = = = = = = =
Day 1960-1961 1961-1962 . . . . . . . 2017-2018
1
2
3
.
.
.
.
.
.
365
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Please tell me how to do this.
Thanks in advance.

2 Commenti

Can provided small portion of your data in text file if possible?
Thanks. Please find attached three month data.

Accedi per commentare.

 Risposta accettata

Guillaume
Guillaume il 18 Nov 2019
Modificato: Guillaume il 20 Nov 2019
A simpler and faster way of obtaining the same as Joe's answer:
dataInitial = readtimetable('Data.xlsx');
dataFinal = table(day(dataInitial.DateTime, 'dayofyear'), year(dataInitial.DateTime), TemperatureDifference, ...
'VariableNames', {'Day', 'Year', 'TemperatureDifference'});
uyears = unique(dataFinal.Year);
dataFinal = unstack(dataFinal, 'TemperatureDifference', 'Year', 'NewVariableNames', compose('%d-%d', uyears, uyears+1))
Since tables and matrices can't have a different number of rows per column, empty entries are automatically filled by NaN.
edit: and for versions < R2019b which don't support arbitrary variable names for table variables:
edit: and for versions < R2019a which don't have readtimetable:
dataInitial = table2timetable(readtable('Data.xlsx'));
dataFinal = table(day(dataInitial.DateTime, 'dayofyear'), year(dataInitial.DateTime), TemperatureDifference, ...
'VariableNames', {'Day', 'Year', 'TemperatureDifference'});
uyears = unique(dataFinal.Year);
dataFinal = unstack(dataFinal, 'TemperatureDifference', 'Year', 'NewVariableNames', compose('Y%d_%d', uyears, uyears+1))
Again, this produces the same result as Joe's code (only faster and with a lot less work!)

Più risposte (1)

Joe Vinciguerra
Joe Vinciguerra il 17 Nov 2019
Modificato: Joe Vinciguerra il 18 Nov 2019
Start by importing your data with either readtable or readtimetable (timetable is more powerful when working with dates, but can be more complicated).
The simplest next step is to create a second table formatted how you want your final data to look: first column is 1:366 (leap years), headers are the year labels. Header labels can be populated using something like
for i=1:57
h(i) = join([string(i+1959),'-',string(i+1960)])
end
Assign h to your table variable names.
Then use a for loop to populate the final table.
Here's one way to do it:
%% The following script assumes dataset is complete, contiguous, and contains more than 2 years of data or you will get errors
% Import the data:
dataInitial = readtimetable("Data.xlsx"); % import the data into a timetable
% Setup time-related variables
% YearFirst = min(year(dataInitial.Datetime)); % the first year in the dataset
YearFirst = 1960; % use this, OR the line above instead
% YearLast = max(year(dataInitial.Datetime)); % the last year in the dataset
YearLast = 2018; % use this, OR the line above instead
dateStart = datetime([YearFirst 6 1]); % the first date in the dataset
dateEnd = datetime([YearLast 5 31]); % the last date in the dataset
oneYear = calendarDuration([1 0 0]); % calendar duration signifying 1 year
bins = dateStart:oneYear:dateEnd; % array containing the start date of each column for the final table
nbins = length(bins); % the number of data columns in the final table
YearRange = YearFirst:YearLast; % array of all years in the dataset
header = strings(1,nbins); % preallocate variables that will be created in a loop
for i = 1:nbins
header(i) = join([string(YearRange(i)),'-',string(YearRange(i+1))]); % generate an array of strings for the headers
end
% setup the output table
tableSize = [366, nbins+1]; % the size of the final table (+1 for column containing day numbers)
varTypes = cell(1, nbins+1); % create an empty cell array for the variable types
varTypes(1) = {'int8'}; % use 'int8' for the day number column
varTypes(2:end) = {'double'}; % use 'double' for all other data elements
dataFinal = table('Size',tableSize,'VariableTypes',varTypes,'VariableNames',['day',header]); % create an empty table to store the results
% arrange the imported dataset into the formatted table
dataFinal.day = (1:366)'; % insert the day number into the first column
for i = 1:length(bins) % for each column of data...
dateRange = timerange(bins(i),bins(i) + oneYear); % identify the date range of interest for this column
dataTemporary = dataInitial(dateRange,:); % extra step for clarity
dataFinal(1:height(dataTemporary),i+1) = dataTemporary; % insert the data extracted into the final table where it belongs
end

11 Commenti

Hi Joe,
Many thanks for answering my query.
I tried your suggestion for filling up headers that worked well but I'm not able to select values to fill final table which are based on date.
For example I need to fill every column with vales from a corresponding column between 1st June to 31st May every year.
Could you please help me a bit more?
Thanks in advance.
Sure. How do you want to account for leap years? For example, do you want some rows to be 365 elements and others to be 366, or should they all be 366 with some having a zero or NaN on February 29th (day 60), etc?
Thank you very much Joe.
I prefere some rows to be 365 and others 366.
Joe Vinciguerra
Joe Vinciguerra il 18 Nov 2019
Modificato: Joe Vinciguerra il 18 Nov 2019
I edited my original answer to reflect your request for more details.
Let me know if you have additional questions, else please accept my answer.
Cheers.
Thank you very much Joe,
I really appreciate your time to answer my query.
Part of new code works well but second part gives following error:
1. Error using table (line 328)
'1960 - 1961' is not a valid variable name.
2. Error in dataarrange (line 25)
dataFinal = table('Size',tableSize,'VariableTypes',varTypes,'VariableNames',['day',header]); % create an empty table to store the results.
What should I chnage in this to make it work? I use matlab2018b.
Thanks in advance.
I don't have that version installed, but it seems that the error describes that the table function doesn't like the string '1960 - 1961' as a variable name. Matlab generally prefers variables that start with letters and don't use spaces, so try this instead:
header(i) = join(['Y_',string(YearRange(i)),'-',string(YearRange(i+1))],'');
The ability to use any name for variables in a table was introduced in R2019b. Indeed, in previous versions table variables must follow the same rules as any variable in matlab, so must start with a letter and can't contain spaces or anything other than letters, numbers or _.
So that join is still not going to work because of the '-'.
Note that using compose or sprintf would make the string construction more readable:
header(i) = sprintf("Y%d_%d", YearRange(i), YearRange(i+1));
Guillaume, thanks for catching that "-" issue too.
Hi Joe and Guillaume,
Many thanks for your answers.
Most part of the both codes works but final table only shows appropriate headers but in data cells all 'zero'.
I get following error:
==========================================
Unable to use a value of type 'timetable' as an index.
Error (line 30 - Joe's code)
dataFinal(1:height(dataTemporary),i+1) = dataTemporary;
==========================================
In M2018b, 'readtimetable' doesn't work. Instead I used timetable(datetime(Date),Tempeature)) - which worked well.
What should I change to avoid this error? I guess this is the last step to fill the table.
thanks in advance.
"but final table only shows appropriate headers but in data cells all 'zero'."
I'm not entirely sure what this mean. Anyway, with which code do you get that?
"I get following error"
This would indicate that for some reason i is a timetable, not the loop index. Anyway, I've given you much simpler code (just 4 lines) which will do the same job.
An easier way to get the timetable in pre-R2019a is with:
dataInitial = table2timetable(readtable('Data.xlsx'));
Thanks a lot! It worked really well after a minor change as suggested by Guillaume.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by