Remove non time string values in a time matrix

8 visualizzazioni (ultimi 30 giorni)
Hi,
I have a time string matrix (592x1 cell) that looks something like this.(Time string values are outputs from a parsed serial port communication link).
time_mat = {'00:21:51.000',.........................'00:22:16.200','00:22:16.400','00:22:16.600','2019/05/30','00:22:17.000'....'22Rover6'.......,'2620517.2165',......................}
The bold ones are the ones that need to be removed and replaced with [].
I tried do a string comparison check and size matching criteria to remove the unnecessary data but it didn't work. Can anyone suggest a better approach? I have also attached the time_mat file for your perusal.
Thanks for your time and help.
Ravi

Risposta accettata

Adam Danz
Adam Danz il 1 Giu 2019
Modificato: Adam Danz il 1 Giu 2019
I would use datetime() to convert your cell array of strings to a datetime array. This will return NaT (not a time) for elements that are not in the specified format.
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
Comparison
table(dtMat(110:115), time_mat(110:115),'VariableNames',{'datetime', 'original'})
ans =
6×2 table
datetime original
____________ ______________
00:22:16.400 '00:22:16.400'
00:22:16.600 '00:22:16.600'
NaT '2019/05/30'
00:22:17.000 '00:22:17.000'
00:22:17.200 '00:22:17.200'
00:22:17.800 '00:22:17.800'
To fill in the missing values with linear interpolation, use fillmissing() (r2016b or later)
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
dtMatFill = fillmissing(dtMat,'linear');
% To see the missing data
natIdx = isnat(dtMat); %index of missing data
dtMatFill(natIdx)
If you'd rather work with the cell array of strings, you can replace the bad elements with empties like this:
badIdx = cellfun(@isempty,regexp(time_mat,'\d{2}:\d{2}:\d{2}.\d{3}'));
time_mat(badIdx) = {[]};
  6 Commenti
Adam Danz
Adam Danz il 1 Giu 2019
@Ravi, on second thought, if you know the start time (time_mat(1)) and the sampling interval (0.2 sec), you could just produce the vector of time samples instead of reading them in .
% Convert your strings to datetime format
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
% Fill in the NaT values
dtMatFill = fillmissing(dtMat,'linear');
% sample interval
sampInt = seconds(0.2);
% Total duration of series
totalDur = dtMatFill(end) - dtMatFill(1);
% Expected number of samples given total time and sample interval
nSamples = floor(totalDur/sampInt);
% produce time series
dtMatComplete = dtMatFill(1) + (1:nSamples)'*sampInt;
Ravi
Ravi il 5 Giu 2019
As I am reading the date, time and position values (5 every 1s) real-time, the start time is kind of arbitrary. Since I have a dynamic system, I should check for missing sample(s) in the data flow and interpolate to fill the vacant spots.
I was out testing so didn't get a chance to test it further but I was hoping the method you suggested works on missing position data as well. It works fine for completing the time vector (after a quick check).
Thanks for your time and help.

Accedi per commentare.

Più risposte (2)

dpb
dpb il 1 Giu 2019
Modificato: dpb il 1 Giu 2019
Use the datetime class is probably easiest...see if
tm=datetime(time_mat,'InputFormat','hh:mm:ss.SSS'); % convert to datetime; failures result in NaT
isnt=isnat(tm); % logical vector of those locations
>> time_mat(isnt) % the identified bum records...see if match expectations
ans =
11×1 cell array
{'2019/05/30' }
{'00:22Rover6' }
{'-2620517.2165'}
{'3.6' }
{'2019/05/30' }
{'0.1677' }
{'3954309.3750' }
{'2' }
{'2019/05/30' }
{'00Rover6' }
{'-4250201.7507'}
>> find(isnt) % the locations in the original vector
ans =
112
207
327
333
360
361
430
475
478
547
558
>>
ADDENDUM:
To fill in missing and otherwise clean up the transmission, something like the following:
tu=unique(tm); % there are some duplicated times
tt=timetable(tu,[1:numel(tu)].'); % build a time table from them
tt(isnat(tt.tu),:)=[]; % remove the NaT values to replace
ttnew=retime(tt,tt.tu(1):seconds(0.2):tt.tu(end),'linear'); % build a new table with interpolated values
There were two particular locations with same timestamp--
>> find(diff(t)==0)
ans =
45
139
>> t(40:50)
ans =
11×1 datetime array
...
12:22:00.2
12:22:00.4
12:22:00.4
12:22:00.8
...
What you do with those before you build the timetable I dunno--you could average them or select first/last ignoring the others as the above does...just depends on what's actually happening in your setup as to what you want to do, methinks...
After that, it's just make a new continuous time vector and interpolate -- the existing data will just be replaced with same, you can choose from alternate interpolating schemes as desired depending on the characteristics of the data you're collecting.
ADDENDUM 2:
You can make a more meaningful name for the time vector -- I was keeping separate variables for the original time and then the unique times, etc., so if I made a slip didn't have to go back more than one or two steps--so the tu got morphed into the table as the time variable name. You can fix this to more meaningful as
ttnew.Properties.DimensionNames(1)={'Time'};
for example. If do this before the retime then that's the variable name to use therein instead, of course.
  3 Commenti
dpb
dpb il 1 Giu 2019
See ammended answer...
Ravi
Ravi il 5 Giu 2019
@ dpb, thanks for your comments. I will explore the timeseries object. The issue is I am reading in date,time and position data from multiple sensors through an RF radio using a single COM port and even with the flow-control I see a lot of missing packets. (Which is usually the case with RF).
I will test your method and also follow Adams inputs to see if I can atleast read a continuous data stream on my end.
Thanks for your time and help.

Accedi per commentare.


Steven Lord
Steven Lord il 5 Giu 2019
These don't strike me as being datetime values, they're duration values. The same technique others have suggested (try converting them and look for missing values) will work with duration as worked with datetime. One benefit of converting to duration is that there's no date information added. From the datetime help: "If INFMT does not include a date portion, datetime assumes the current day. If INFMT does not include a time portion, datetime assumes midnight."
time_mat = {'00:21:51.000','00:22:16.200','00:22:16.400','00:22:16.600',...
'2019/05/30','00:22:17.000','22Rover6','2620517.2165'}
dt = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS')
du = duration(time_mat)
Elements 5, 7, and 8 of both dt and du are missing and so can be identified using ismissing or removed with rmmissing.
ismissing(dt)
ismissing(du)
You could use either a datetime or a duration as the RowTimes in a timetable.
  2 Commenti
dpb
dpb il 5 Giu 2019
I have a hard time (so to speak! :) ) wrapping my head around a sampled timestamp being a duration, Steven. I grok it's the only way with the new classes one can have any time standing alone without an associated date, but it still just doesn't seem right nomenclature.
I've not gotten comfortable-enough as yet with the duration to be able to tell if there's something that doesn't agree with the use that way, but it never occurs to me naturally as yet to make use that way.
I really fail to see why a datetime can't have a void date portion other than it wasn't designed to allow for it...with the venerable datenum it was simple to just save only the fractional day.
Maybe eventually I'll come to grips with "the new normal", but as yet it's still a stretch... :)
Steven Lord
Steven Lord il 6 Giu 2019
A sampled timestamp is the amount of time that has elapsed since a certain basetime, right? The basetime could be the start of an experiment, the time a piece of hardware was turned on, or the start of a new day (midnight.) So the timestamp represents the duration of the experiment so far, the duration of the current run of that hardware, or the duration that's elapsed today.
datetime can answer the question "when?" while duration can answer the question "how long?" Upon rereading the original post, I can see that the data could be the answer to either of those questions. It could be thought of as representing when events occurred, it could also be thought of as representing how long after midnight (or the time the serial port became active) the events occurred. Since the expression in the data representing a date was unwanted, I interpreted it as the latter.

Accedi per commentare.

Categorie

Scopri di più su Dates and Time in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by