Retime data aggregation for ID

7 views (last 30 days)
Gian23 on 26 Apr 2021
Answered: Eric Sofen on 4 May 2021
Hello everyone,
I'm going to calculate the monthly average of daily temperatures grouping by sensor using retime function. At the moment I'm trying to select each sensor with a loop and then apply the retime function, but I have to calculate six million rows so I would like to avoid a loop to speed up the calculation
I give an example:
Time = datetime({'18-feb-2021';'19-feb-2021';...
'01-mar-2021';'04-mar-2021';'18-feb-2021';...
'19-feb-2021';'01-mar-2021';'04-mar-2021'});
Temp = [56.82;62.72;64.52;63.81;63.45;59.7;60.27;61.32];
Sensor = [12;12;12;12;13;13;13;13];
TT = timetable(Time,Sensor,Temp);
Current code:
uni_sensor = unique(Sensor);
monthly_ds = timetable;
parfor kk = 1:length(uni_sensor)
index_retime = find(TT.Sensor == uni_sensor(kk));
sensor_retime = TT(index_retime,:);
monthly_data = retime(sensor_retime(:,2), 'monthly', 'mean');
data_sensor_retime = array2table(repmat(uni_sensor(kk), size(monthly_data, 1),1), 'VariableNames', "Sensor" );
monthly_sub_id = [monthly_data, data_sensor_retime];
monthly_ds = [monthly_ds; monthly_sub_id];
end
Desired output:
Time = datetime({'feb-2021';'feb-2021';...
'mar-2021';'mar-2021'});
Temp = [59.77;61.575;64.165;60.795];
Sensor = [12;12;13;13];
TT_out = timetable(Time,Sensor,Temp);
Gianluca

Marco Riani on 27 Apr 2021
I think in this example it is unnecessary to use retime.
I would proceed as follows.
Time1=char(Time);
Time2=Time1(:,4:end);
TT = table(findgroups(string(Time2)),Sensor,Temp);
groupvars={'Sensor' 'Var1'};
datavars='Temp';
groupsummary(TT,groupvars,'mean',datavars)
Instead of using groupsummary it is possible to use grpstats. Please let us know which between groupsummary and grpstats is faster.
Gian23 on 27 Apr 2021
Very interesting approach and usefull overall! I tried yor solution and it works very well.
On the evidence of one test, grpstats is faster than groupsummary.
Many thanks again!

Eric Sofen on 4 May 2021
Another approach is to unstack the timetable based on the sensor ID, so you'd have a wide timetable with separate variables temp_12, temp_13, ..., then apply retime to that without a need for grouping. I don't know if it would be faster than Marco's findgroups approach (which is quite clever), and having the sensor IDs embedded in the table variable names may or may not be useful in the long run, but it's yet another way to tackle this problem.

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by