Computing average absolute time from date strings without years

2 visualizzazioni (ultimi 30 giorni)
Hi there, I have a very large table with a column of (12 hr) times saved as strings in a cell array, and a column of folder names also saved as strings in a cell array. So for example, say my table was called 'data', then if I type:
K>> data.folder
ans =
1150×1 cell array
{'160115/f1/f2/f3'}
I can access a string directly by typing:
K>> data.folder{1}
ans =
'160115/f1/f2/f3'
The first part of the folder string has the date in YYMMDD format. Another column in the table has the time in a string (12 hour clock):
K>> data.end_time{1}
ans =
'11:24:17 AM'
The actual table is enormous. I have some index for each row which fits into some supercategory (eg taken on some date), and I want to average the absolute time for each row in that category. How can I do this?
I have a brute force solution for the first part which works but I don't like how it's a loop. I would prefer to do it in a more compact way. Anyways, here it is:
exact_times = datetime(ramsey_data.end_time);
date = ramsey_data.folder;
for i=1:numel(exact_times) ymd = date{i};
ymd = ymd(1:6);
year = ['20' ymd(1:2)];
month = ymd(3:4);
day = ymd(5:6);
exact_times(i).Year = str2num(year);
exact_times(i).Month = str2num(month);
exact_times(i).Day = str2num(day);
end
I still need to find a way to average the resulting values in exact_times properly. I want to select only some of the rows in the table to average. If I could use accumarray I would use something like: average_time = accumarray(subs, exact_times, [], @mean); but I can't use that on datetime variables
  4 Commenti
supernoob
supernoob il 21 Mar 2018
Modificato: dpb il 21 Mar 2018
I have a brute force solution for the first part which works but I don't like how it's a loop. I would prefer to do it in a more compact way. Anyways, here it is:
exact_times = datetime(ramsey_data.end_time);
date = ramsey_data.folder;
%%%BRUTE FORCE%%%%really sorrryyyyy
for i=1:numel(exact_times)
ymd = date{i};
ymd = ymd(1:6);
year = ['20' ymd(1:2)];
month = ymd(3:4);
day = ymd(5:6);
exact_times(i).Year = str2num(year);
exact_times(i).Month = str2num(month);
exact_times(i).Day = str2num(day);
end
I still need to find a way to average the resulting values in exact_times properly. I want to select only some of the rows in the table to average. If I could use accumarray I would use something like:
average_time = accumarray(subs, exact_times, [], @mean);
but I can't use that on datetime variables
dpb
dpb il 21 Mar 2018
What's wrong with the posted solution? Just turn whatever is the selection criterion into a logical addressing vector. ismember or whatever works on datetime.

Accedi per commentare.

Risposta accettata

Peter Perkins
Peter Perkins il 23 Mar 2018
It's really just three lines to do this. I've shown scalar values, but it works just as well on string or cellstr arrays:
>> date = '160115/f1/f2/f3';
>> time = '11:24:17 AM';
>> date = extractBefore(date,'/')
date =
'160115'
>> date = datetime(date,'InputFormat','yyMMdd')
date =
datetime
15-Jan-2016
>> time = timeofday(datetime(time,'Format','hh:mm:ss aa'))
time =
duration
11:24:17
>> dt = date + time
dt =
datetime
15-Jan-2016 11:24:17
  11 Commenti
Peter Perkins
Peter Perkins il 5 Apr 2018
That's a lot to digest. I confess I'm not entirely following the points you've made, so lemme take a shot at what I think you are talking about. This is kind of long, but perhaps helpful to someone.
Here's a plot of some hoked-up data that follows a diurnal pattern, over five days:
dt = datetime(2018,4,4) + days(sort(5*rand(1,100)));
y = sin(2*pi*hour(dt)/24) + .1*randn(size(dt));
plot(dt,y,'o')
If I get the time-of-day as a duration, I can plot all of the values vs. that. I can even mess with the ticks and their labels to get the look of a 12hr clock:
tod = timeofday(dt);
plot(tod,y,'o')
set(gca,'XLim',hours([0 24]),'XTick',hours(0:6:24),'XTickLabel',{'12:00am' '6:00am' '12:00pm' '6:00pm' '12:00am'});
For the purposes of a plot, that seems to capture what you are asking for. I might want color to indicate different days, but that can be done with scatter. I could also have left the automatically-chosen tick locations alone but set their tick label format to something like 'hh:mm'.
For the purposes of calculations, I could imagine wanting to have addition wrap at 0 and 24hrs. In many cases one should be doing datetime arithmetic rather than wrapped time-of-day arithmetic, but the wrapping without dates involved is what I meant by an "unanchored time of day".
I'd be interested to hear if I've just missed your point. In particular, I'm not following, "needs both the duration and the time of day at which the event began". Those seem like almost identical things that can both be represented with durations.
The analogous thing for a cyclic annual plot is a bit trickier. Here's some hoked-up data that follow an annual pattern, over five years.
dt = datetime(2018,1,1) + years(sort(5*rand(1,100)));
y = sin(2*pi*month(dt)/12) + .1*randn(size(dt));
plot(dt,y,'o')
There is no "timeofyear" function. There are however month and day functions that can return month-of-year and day-of-year. There's also the possibility of getting the "elapsed time within year" as a combination of dateshift and subtraction. The issue with plotting values vs. day-of-year or time-within-year is leap years. I don't know that there's standard way to make those plots, at least not as one continuous 1-Jan to 31-Dec time line. I'm no expert. One could imagine adding 29-Feb and 3/4ths of the data would have a gap. I've also seen plots with 12 separate day-of-month axes. For simplicity, let's stick with a plot of values vs. month-of-year.
moy = month(dt);
plot(moy,y,'o')
monthNames = month(datetime(2018,1:12,1),'shortname');
set(gca,'XTick',1:12,'XTickLabel',monthNames,'XTickLabelRotation',-45);
This plot looks a bit different than the time-of-day plot, because the x values are discrete. It's also a few more lines of code, because while duration serves pretty well as an unanchored time of day, there's no data type for unanchored months of year. So get the month number, and force the tick labels.
Again, I'd be interested to hear if I've just missed your point.
A couple of more specific responses:
1) There's no way to plot vs. calendarDurations, because in general, they can't even be ordered unless they are expressable in only one unit (months or days, usually). And if that's true, one could just as easily plot vs. the numeric values, e.g. vs. 1:3 instead of vs. caldays(1:3).
2) You are correct in saying that duration is, "a calendar-agnostic time span in which a year is the average year length rather than a specific 365/366 day year". I would just add the small clarification that "years" in the sense of durations is just so the scale of the values doesn't become completely unfamiliar for large spans. Everyone knows how long 10 years is; how long is 315569520 seconds? So something like 10.051 yrs is not something one can pin down on a calendar, it's just giving a more familiar scale.
3) You already know this, but it would not make sense to have MMM as a calendarDuration format -- calendarDurations are lengths of time. 3 months is very different than "March". Which leads to ...
4) ... What's missing in the ecosystem are unanchored time periods. "March", as opposed to "1-March-2018". I think that's what you were looking for when trying to use a calendarDuration as the x axis of a plot. Not being able to make the "reference year" disappear in the datetime plot reflects that difference. For now, I think my example above is your best bet. I'd stay away from datenums if possible.
I hope this helps clarify things, and thanks for the useful conversation.
dpb
dpb il 5 Apr 2018
Modificato: dpb il 6 Apr 2018
You captured the gist of the issue, Peter. The annual plot on day-of-year is the one, specifically. I'm no expert in the field of the OP who asked the question, either; his proposed "solution" to the leap year problem was to shift the dates by +1 for leap year dates after February. My contention was, looking at the published paper from which he got the form of the plot he was trying to duplicate is that the plot axis is labelled as and represents calendar time and so the actual date is the calendar time for the year in which it occurs but I gave him the option to do the shift if he so chose to do so, anyway.
In the end, given the the need for the time-within-year vis-a-vis your example by month, I found the datenum representation of the integer fraction holding the day more convenient in that then could use datetick to do the labeling instead of writing the tick labels. That's a choice probably as much from being an old fogey who's well-versed with datenum and its ilk in many other incarnations in Fortran libraries, etc., as well as Matlab. As noted, I started down datetime/duration path as the query came from a new user and I really didn't want to revert to deprecated features but thought having to write the tick labels was at least as confusing if not more so for him so ended up with what I thought the lesser evil.
The only additional complication over your example is that the y-axis is also a duration rather than a single value in the subject case; I ended up drawing broken lines over the extent of each duration by introducing NaT/NaN between each observation period of a given year to control color by year.
Again, I sincerely appreciate your willingness to discourse over these issues; I gain some appreciation for the thought processes behind some of the implementations I can subsequently pass on...and, I'll continue to ponder and perhaps play with your implementation a little and see if I can make that direction more straightforward but yours with dateshift/subtraction seems inherently more awkward owing to the internal representation in datetime object not having a 1:1 correlation to DOY as does the integer portion of a datenum. Writing a "dayofyear" wrapper could be the trick; I was trying to minimize user functions for the OP in the question response as a secondary objective as well.

Accedi per commentare.

Più risposte (2)

dpb
dpb il 21 Mar 2018
Modificato: dpb il 22 Mar 2018
t=readtable('example.csv'); % read the data
t.folder=char(t.folder); % to char array
t.date=datetime(t.folder(:,1:6),'inputformat','yyMMdd'); % convert conglomeration to time
[~,~,~,h,m,s]=datevec(t.end_time); % get wanted h,m,s
[Y,M,D]=datevec(t.end_time); % and corresponding date
dt=datetime(Y,M,D,h,m,s) % build the composite date/time
dt =
10×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:36
15-Jan-2016 11:25:36
>> mean(dt)
ans =
datetime
15-Jan-2016 11:24:32
>>
Could also parse the string data first and read the time as string and piece them together as text input to datetime. Will leave as "exercise for the student" :)
ADDENDUM
Per sidebar comments using your list of G:
>> splitapply(@mean,dt,G)
ans =
4×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:09
>>
ADDENDUM 2
If the issue you're having is the nested cell array; there's another path besides that above avoiding the intermediary of the table --
>> c=textscan(fid,'%s%s','delimiter',',','headerlines',1);
>> times=char(c{1}); folder=char(c{2});
>> dt=datetime([folder(:,1:6) blanks(10).' times],'inputformat','yyMMdd hh:mm:ss a')
dt =
10×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:36
15-Jan-2016 11:25:36
>>
splitapply with the grouping vector as above from here.
Unfortunately, Matlab doesn't have the syntax to be able to use sub-indexing of doubly-nested cellstr arrays without the intermediary variable to dereference them to singly-nested as above; there's probably the place where you're having addressing issues. (Of course, that I inadvertently deleted the conversion step in initial posting didn't help...sorry about that :P)
You don't show how you're getting the data into the app; as above textscan has the penchant to put everything into that two-level structure that is convenient for variables returned from it being succinct but not necessarily so much when comes to using the results.
  6 Commenti
supernoob
supernoob il 22 Mar 2018
No it does not solve the issues. I understand t is from the table object created from the csv. The actual table in my code consists of cell arrays, so none of the indexing you have posted will work. This is noted in my original question and in a previous comment to you. The answer I posed and accepted solves all of my issues. Thank you very much for your help, I'm sorry if my question was not clear but I have solved it now and posted the answer.
dpb
dpb il 22 Mar 2018
Modificato: dpb il 22 Mar 2018
Oh, I see when I pasted from workspace and cleaned up the superfluous outputs I inadvertently deleted one line..converting the cell array to char()...
t.folder=char(t.folder);
before the
t.date=...
step....

Accedi per commentare.


supernoob
supernoob il 21 Mar 2018
Ok, I've got it all working, though it isn't quite as compact as I hoped. Here is how it goes:
for i=1:numel(exact_times)
ymd = date{i};
ymd = ymd(1:6);
year = ['20' ymd(1:2)];
month = ymd(3:4);
day = ymd(5:6);
exact_times(i).Year = str2num(year);
exact_times(i).Month = str2num(month);
exact_times(i).Day = str2num(day);
end
%get the average absolute time for rows which match indices in subs
average_time = accumarray(subs, datenum(exact_times), [], @mean);
avg_data.average_time = datetime(average_time, 'ConvertFrom', 'datenum');
  1 Commento
dpb
dpb il 22 Mar 2018
Modificato: dpb il 22 Mar 2018
What happened to the time strings; this only looks at the dates?
Going back to your original where you show you have
K>> data.folder
ans =
1150×1 cell array
{'160115/f1/f2/f3'}
K>> data.end_time{1}
ans =
'11:24:17 AM'
then
folder=char(data.folder{:});
times=char(data.end_time{:});
would get you to the point at which I started after reading the csv file into the table I named t excepting w/ standalone variables as in the latter example instead of inside the table.

Accedi per commentare.

Categorie

Scopri di più su Dates and Time in Help Center e File Exchange

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by