60 years of daily data; need to group separately by month and year
8 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Butterfly
il 26 Feb 2023
Commentato: Star Strider
il 27 Feb 2023
I am completely new to MATLAB and have about 60 years of daily data (22246 rows) and four columns of interest.
I've imported the data into MATLAB as a table from a .csv.
I want to start by grouping this into months so that I can then analyse averages, minimums, maximums, general trends, plot graphs etc. but I don't want to add up all monthly data into a single value for each month for each column. Instead, I want to see a single value for each month of every year (so condensing this into about 720 rows), not 12 rows where all January data is aggregated and then all February etc.
I.e.
J F M A M J J A S O N D
2000
2001
2002
...
I know that this cannot go into the original table and that a new one will need to be created.
[Later on, I want to do this by year but once I know how to do it by month, I should be able to amend the code accordingly.]
Another problem is that the data starts mid-year and ends mid-year (i.e. start date is not 1st Jan and end date is not 31st Dec), although the data does run consecutively without a break once it starts.
The original dates are in the form DD-MMM-YYYY.
I already know how to do the analysis (e.g. averages) on the data but my problem is getting to a starting point in terms of the values that I want to analyse.
I've tried searching forums already and can't find the answer or at least one that my limited knowledge of MATLAB can understand. Please help as I'm tearing my hair out!
Thank you!
3 Commenti
Stephen23
il 26 Feb 2023
Modificato: Stephen23
il 26 Feb 2023
Do NOT solve this with loops, lots of UNIQUE calls, padding arrays with ZEROS and the like.
Learn to use MATLAB instead of fighting it. The most important thing is to get the data design right, which does take some practice. But in many cases, a good place to start is to import data as a table:
You can solve this task in just a few table commands: the more you browse the documentation, the more you will find and know what is available to help you solve your problems:
Risposta accettata
Star Strider
il 26 Feb 2023
One approach for the mean values —
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1307475/Flows.csv')
TT1 = table2timetable(T1)
TT1m = retime(TT1, 'monthly', 'mean') % Aggregate On 'mean' Values
VN = T1.Properties.VariableNames;
T1maDate1 = datetime('01-Jan-1958') + calmonths(0:8).'; % Pad First Nine Months
T1ma1 = zeros(size(T1maDate1,1),4);
T1ma1 = [table(T1maDate1) array2table(T1ma1)];
T1ma1.Properties.VariableNames = VN;
T1maDate2 = TT1m.Date(end) + calmonths(1:4).'; % PAd Last Four Months
T1ma2 = zeros(size(T1maDate2,1),4);
T1ma2 = [table(T1maDate2) array2table(T1ma2)];
T1ma2.Properties.VariableNames = VN;
TT1m = [table2timetable(T1ma1); TT1m; table2timetable(T1ma2)] % Pad Array To Fill Out Months
Monthsc = unique(month(TT1m.Date, 'shortname'),'stable');
Yearsc = unique(year(TT1m.Date),'stable');
for k = 1:size(TT1m,2)
T1var{k,1} = array2table(reshape(TT1m{:,k}, 12, []).', 'RowNames',string(Yearsc), 'VariableNames',Monthsc);
T1var{k,2} = VN{k+1};
end
T1var{1,:}
Repeat this for the other statistics.
.
2 Commenti
Star Strider
il 26 Feb 2023
As always, my pleasure!
I’ve had some experience with this sort of problem recently with another thread, so I know that there do not appear to be existing functions that can do this sort of operation. (It would be nice if there were!) There might be other ways of preallocating the table to avoid padding both ends of it, however that was not immediately obvious either. The matrix approach seems to work best here.
.
Più risposte (2)
Image Analyst
il 26 Feb 2023
First see if you can do it yourself using one of these functions: splitapply, grpstats, groupsummary
If you still can't figure it out, let us know if you have the stats toolbox, and someone will do it for you.
Stephen23
il 26 Feb 2023
Modificato: Stephen23
il 27 Feb 2023
The simple MATLAB approach using GROUPSUMMARY and UNSTACK:
M = categorical(datetime(1,1,1,"Format","MMM"):calmonths(1):datetime(1,12,31));
T = readtable('Flows.csv');
T.Year = T.Date.Year;
T.Month = M(T.Date.Month).';
S = groupsummary(T,["Year","Month"],"mean","TotalDailyFlowRate_m3s");
U = unstack(S,"mean_TotalDailyFlowRate_m3s","Month", "GroupingVariables","Year")
4 Commenti
Stephen23
il 27 Feb 2023
Modificato: Stephen23
il 27 Feb 2023
@Star Strider: thank you! For some reason I didn't think of a non-scalar input to CALMONTHS.
M = categorical(datetime(1,1,1,"Format","MMM")+calmonths(0:11))
sort(M) % sorts into category order (not alphabetic order)
Vedere anche
Categorie
Scopri di più su Logical in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!