Efficiently processing unusual date/time format
6 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I have a number of Excel files with many channels of data as well as an "Absolute Time" column in each. I can read the data in with no issue, but I am processing the "raw" data from the [~,~,raw] = xlsread(...) function because the time is in the unusual DDD:HH:MM:SS.sssssssss format (where DDD is a three digit number for the day of the year). As far as I can tell the datenum and similar functions can't work with this format. So I'm processing it manually by breaking the string into pieces at the colon character, converting the pieces to numbers and multiplying them out. However, processing this column of data stacks up to about half of the total time for processing my files (using the profiler), so I'd like to figure out a more efficient way of doing it.
Here is an excerpt of column A of the spreadsheet:
...
AbsoluteTime
DDD:HH:MM:SS.sssssssss
Absolute Time
131:17:31:20.000000000
131:17:31:20.050000000
131:17:31:20.100000000
131:17:31:20.150000000
131:17:31:20.200000000
131:17:31:20.250000000
131:17:31:20.300000000
...
There are an arbitrary number of comment lines at the top of the file (hence the initial ...). After passing over the comment lines, I arrive at setting an index called varNameRowIdx to the line which has the "DDD:HH:MM:SS.sssssssss" string. I actually don't care whether I end up with absolute times or times relative to the first time value, but I do need to be able to properly handle the case where it rolls over to the next day within the data. Right now, I'm grabbing the first time value ( raw{(varNameRowIdx+2),1}) in the first sheet ( jj of 1) of the first file ( ii of 1) and am then offsetting all other time values by that t0 to create relative time.
% define first time value of first sheet of first file as t = 0
if 1==ii && 1==jj
t0 = str2double(strsplit(raw{(varNameRowIdx+2),1},':'));
t0 = t0(1)*86400+t0(2)*3600+t0(3)*60+t0(4);
end
% specially process time column
idx = 1; tvec = zeros(size(raw,1)-(varNameRowIdx+1),1);
for kk=(varNameRowIdx+2):size(raw,1)
tt = str2double(strsplit(raw{kk,1},':'));
tt = tt(1)*86400+tt(2)*3600+tt(3)*60+tt(4);
tvec(idx) = tt-t0;
idx = idx+1;
end
The tt = str2double(strsplit(raw{kk,1},':')); line is the one that's really expensive, with the strsplit and str2double functions taking 21 and 23% of the total time, respectively. I'm sure there's a better way to process the time stamp data than one element at a time, but I'm not sure what it is. Any suggestions?
0 Commenti
Risposta accettata
dpb
il 1 Ago 2018
Modificato: dpb
il 1 Ago 2018
Once you have the cell array that holds the date strings, then datetime will convert from day of year...
>> datetime(r,'InputFormat','DDD:HH:mm:ss.SSSSSSSSS')
ans =
7×1 datetime array
11-May-2018 17:31:20
11-May-2018 17:31:20
11-May-2018 17:31:20
11-May-2018 17:31:20
11-May-2018 17:31:20
11-May-2018 17:31:20
11-May-2018 17:31:20
>>
ADDENDUM
For the particular case, specifying an output format as well may be useful
>> datetime(r,'InputFormat','DDD:HH:mm:ss.SSSSSSSSS','Format','dd-MMM-uuuu HH:mm:ss.SSS')
ans =
7×1 datetime array
11-May-2018 17:31:20.000
11-May-2018 17:31:20.050
11-May-2018 17:31:20.100
11-May-2018 17:31:20.150
11-May-2018 17:31:20.200
11-May-2018 17:31:20.250
11-May-2018 17:31:20.300
>>
or, since this looks like a sampling dataset, as you're doing now convert to duration--
>> d=t-t(1);
>> d.Format='mm:ss.SSS'
d =
7×1 duration array
00:00.000
00:00.050
00:00.100
00:00.150
00:00.200
00:00.250
00:00.300
>>
You then might find
doc timetable
doc timeseries
of interest depending on what else is to be done once have the data.
3 Commenti
Più risposte (1)
Peter Perkins
il 3 Ago 2018
Michael, your data example says, "Absolute Time", so datetime may indeed be the right thing to create. But beginning in R2018a, you can convert certain kinds of "duration text" directly to durations:
>> t = duration('131:17:31:20.000000000','Format','dd:hh:mm:ss.SSSSSSSSS')
t =
duration
131:17:31:20.000000000
>> t.Format = 'd'
t =
duration
131.73 days
>> t.Format = 's'
t =
duration
1.1381e+07 sec
Vedere anche
Categorie
Scopri di più su Calendar in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!