How to avoid double counted hours for a given day

3 views (last 30 days)
For a given month (say November), I have hourly data for each day. The problem is, sometimes they have put the same hour twice on the hourly column. For an example 0 1 2 2 4 5 6 7 7 9...etc for a given day. That I have to correct to 0 1 2 3 4 5 6 7 8 9... Also there may be some missing hours during the month. I need to identify those as well. Hours are given in UTC. I want to convert in to local time, and if possible pay attention to change of day light savings. We are in UTC-6 till day light savings end and UTC-7 afterwards.
Appreciate if someone could help me.
Thanks.
  3 Comments
dpb
dpb on 21 Jul 2017
Edited: dpb on 21 Jul 2017
So have you tried any of the suggestions posited? What has been the result(s)?
And, I'll note the "missing/duplicate" hours is exactly what I suspected--
>> data(1:20,:)
ans =
2010 11 4 0 0
2010 11 4 1 0
2010 11 4 1 59
2010 11 4 3 0
...
There's 1:00 and 1:59 then 3:00 so the 2 o'clock hour happened just before the hour occurred. So, what do you want--the actual time or round to nearest hour and trash the minute values?

Sign in to comment.

Answers (1)

dpb
dpb on 21 Jul 2017
Edited: dpb on 21 Jul 2017
Convert your time data columns to datetime values with datetime. It has ability to use a given time zone and output UTC.
Once you've done that(*), you can use relational tests to find any duplicated values and eliminate those rows as well as any locations with differences other than one hour and fill those in.
As Andrei indicates, exact code would be simpler to write for your particular case if had the full form of the data, but those are the basics and datetime should handle it in a pretty straightforward manner.
() Excepting for the days which shift to/from DST the duplicate values could be eliminated by |*unique| for the hours; it's possible for those days to have duplicate values at the witching hour.
OTOH, if the duplicates are from the fact it's a case where the data collection process is on a polling cycle and running at "just approximately" an hour, perhaps you've got data at 2:01 and then again at 2:58 and then not until 4:02 which explains both the "duplicate" values in hour 2 and the missing hour 3 but there's really no missing data at all. In that scenario you wouldn't want to add or subtract anything but instead you'd want to round the observed actual time to the nearest hour.
ADDENDUM
As noted above, there are no missing values, simply not sampled exactly at the hour every hour. datetime is your friend...
>> data=xlsread('workbook1-test.xls'); % read the data
>> da=datetime([data zeros(length(data),1)]); % convert to datetime
>> da=da(1:10) % show first 10 for brevity
da =
04-Nov-2010 00:00:00
04-Nov-2010 01:00:00
04-Nov-2010 01:59:00
04-Nov-2010 03:00:00
04-Nov-2010 04:00:00
04-Nov-2010 04:59:00
04-Nov-2010 06:00:00
04-Nov-2010 07:00:00
04-Nov-2010 07:59:00
04-Nov-2010 09:00:00
>> da=dateshift(da,'start','hour','nearest') % fixup by rounding nearest hour
da =
04-Nov-2010 00:00:00
04-Nov-2010 01:00:00
04-Nov-2010 02:00:00
04-Nov-2010 03:00:00
04-Nov-2010 04:00:00
04-Nov-2010 05:00:00
04-Nov-2010 06:00:00
04-Nov-2010 07:00:00
04-Nov-2010 08:00:00
04-Nov-2010 09:00:00
>>
Done.
  4 Comments
dpb
dpb on 23 Jul 2017
On checking after the other ?, I see I didn't actually post the duration--I thought I had done:
>> du=da-da(1)
du =
00:00:00
01:00:00
...
08:00:00
09:00:00
10:00:00
11:00:00
12:00:00
13:00:00
...
22:00:00
23:00:00
24:00:00
25:00:00
26:00:00
27:00:00
...
40:00:00
41:00:00
>>
where I've elided bunches in middle for brevity...there's only so much interest in a sequence of numbers. :) The point being that duration counts accumlative time while datetime values keep everything from happening at once and keep "when" as well.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by