Duration type doesn't preserve seconds accuracy

15 visualizzazioni (ultimi 30 giorni)
James Tursa
James Tursa il 10 Mag 2024
Modificato: James Tursa il 16 Mag 2024
I've been working on some time conversion routines and ran into an issue I didn't expect. Basically, the duration type doesn't preserve seconds accuracy. This is caused by the fact that the duration type stores the duration as a single variable called millis (i.e., milliseconds) instead of separate d,h,m,ms or h,m,ms. Why this would be coded in that way does not make any sense to me. E.g., because of this you can get issues such as this:
format longg
dt1 = datetime(2000,1,1,0,0,1.2345678912345);
disp(dt1.Second)
1.2345678912345
dt2 = datetime(2000,1,1);
[~,~,s] = hms(dt1-dt2)
s =
1.2345678912345
All well and good, the seconds is represented to full precision in the datetime variable, and the precision of the seconds difference is preserved in the subtraction because the datetimes are close to one another. But if they are not:
dt3 = datetime(1900,1,1);
[~,~,s] = hms(dt1-dt3)
s =
1.23456764221191
You suddenly get a different result because the difference got mashed into a single millis variable and that large days value caused precision loss in the seconds. The hms() function can't pull out seconds accuracy that isn't there. And simply constructing a duration type from scratch doesn't solve anything either because of the internal duration storage as a single millis variable. E.g.,
[~,~,s] = hms(duration(0,0,1.2345678912345))
s =
1.2345678912345
[~,~,s] = hms(duration(1e6,0,1.2345678912345))
s =
1.23456764221191
I know, I know ... "You shouldn't depend on this level of accuracy in floating point calculations" etc. But that's not the point. The point is as long as you have a datetime type that stores individual d,h,m,s properties, then IMHO the duration type should follow suit and preserve accuracy in calculations as much as possible to match it. If the duration type stored the data internally as d,h,m,s (or ms or ns) and took care when doing the internal arithmetic, the above differences could be avoided. In fact, since the millis is a private internal variable, I think TMW could still do this in future versions of MATLAB without breaking any user code since there shouldn't be any user code that accesses the millis property. Any chance of this, TMW?
And speaking of the datetime type, IMHO the internal variable should have been millis or nanos instead of seconds. That way standard Terrestrial Time epochs such as J2000 (January 1, 2000, 11:58:55.816 UTC) etc. could be represented exactly internally. But since datetime y,m,d,h,m,s properties are public this unfortunately can't be changed now.

Risposte (2)

Stephen23
Stephen23 il 14 Mag 2024
"Duration type doesn't preserve seconds accuracy"
That is exactly what CALENDARDURATION objects are for:
format longg
dt1 = datetime(2000,1,1,0,0,1.2345678912345);
dt2 = datetime(2000,1,1);
dt3 = datetime(1900,1,1);
ddt = between(dt3,dt1)
ddt = calendarDuration
100y 0h 0m 1.2345678912345s
  3 Commenti
James Tursa
James Tursa il 15 Mag 2024
Modificato: James Tursa il 15 Mag 2024
Follow-up: I took a look at the calendarDuration type, and this is definitely NOT just a more precise albeit slower alternative to duration. The calendarDuration type is a fuzzy duration type that changes the actual duration to conform to the dates at hand. It is pretty much useless for the time scale calculations I am concerned with (e.g., between UTC, GPS, TT, etc.). For example, there are two leap seconds in 1972, one in June and one in December. And there is one leap second in 1973. The minus operator with two datetimes easily sees these and returns them in the difference:
d1 = datetime(1972,1,1,'TimeZone','UTCLeapSeconds');
d2 = datetime(1973,1,1,'TimeZone','UTCLeapSeconds');
d3 = datetime(1974,1,1,'TimeZone','UTCLeapSeconds');
d2-d1
ans = duration
8784:00:02
d3-d2
ans = duration
8760:00:01
You can see those leap seconds easily. Now use between:
between(d1,d2)
ans = calendarDuration
1y
between(d2,d3)
ans = calendarDuration
1y
Exactly one year, no extra seconds in both cases. The purpose, it seems, is to keep arithmetic fluid depending on what dates are being used. E.g.,
d1 + between(d1,d2)
ans = datetime
1973-01-01T00:00:00.000Z
This recovers d2 exactly. But the following also recovers d2 exactly:
d1 + between(d2,d3)
ans = datetime
1973-01-01T00:00:00.000Z
That is, the actual time difference between d2 and d3, which contains exactly 1 leap second, morphed into whatever time difference is needed to obtain a "calendar year" difference between the d1 and the result.
So, the calendarDuration is fuzzy and morphs the actual duration depending on the dates at hand. I'm not sure what the intended application of calendarDuration types is, but it is useless for time scale conversions, dynamic times in equations of motion, etc. So this is not something I can use.
Peter Perkins
Peter Perkins il 15 Mag 2024
calendarDuration is for calendar arithmetic. The differences between d1 and d2, and between d2 and d3 are exactly one calendar year. By fuzzy, I think you mean that some calendar years are 365.2425*86400s long, others are 366.2425*86400s, and still others are 1s longer than both of those. In the same way, some calendar months are 30*86400s long, or maybe 31* or 29* or 28*. Or 1s longer than some of those. That's what calendarDuration is for, that's why there are two "duration" types, and why there is a days function and a caldays function. But it sounds like that's not what you need.
I think there are good ways to do what you want, but I'm not yet clear on the details.

Accedi per commentare.


Peter Perkins
Peter Perkins il 14 Mag 2024
Modificato: Peter Perkins il 14 Mag 2024
James, Stephen's response is what I would have said, but there are a lot of things in this post, and I thought I would add to Stephen's response.
The title of this post is, "Duration type doesn't preserve seconds accuracy." That title might be interpreted in several ways, one of which I know you don't mean.
1) It might be interpreted as, duration doesn't provide enough accuracy to represent elapsed time to the accuracy of one second, or maybe to the accuracy of subseconds. Clearly it does, and you didn't mean this, but the title is misleading.
format long g
d1 = seconds(0.123456789)
d1 = duration
0.123456789 sec
d2 = seconds(123456789) + d1
d2 = duration
123456789.123457 sec
d3 = d2 - d1
d3 = duration
123456789 sec
(I've ignored the fact that 0.123456789 isn't actually 0.123456789, it's the closest d.p. to that; before this even gets to the seconds function you are done for in that respect. Better to use milliseconds(123.456789) if you really care about such precision. Why no nanoseconds function? Because at the moment, as you say, duration uses units of milliseconds internally.)
2) What you mean is that duration does not have as much precision as datetime, so that on the one hand datetime can represent very precise timestamps, down to at least nanoseconds, over the age of the universe, but if you take differences of datetimes, duration is not able to represent those elapsed times to the same precision over large intervals. That last part, "over large intervals" is important. duration has the same precision as double, in units of (as you say) milliseconds). Computing the difference in datetimes that are close together as a duration will get you answers accurate to sub-nanosecond, but computing differences in datetimes that area century apart as a duration will only be able to preserve down something like tenths of microseconds:
eps(100*365.2425*86400000)/1000
ans =
4.8828125e-07
So what you say is correct, but I would ask in response, do you have any real uses where you need that kind of precision over that long a range? I'm genuinely asking. If you are doing particle physics or something, with elapsed times of like picoseconds, you can absolutely use durations, because it seems unlikely that you'd care about elapsed times longer than, what, a few minutes?
Some other scattered things:
  • "the duration type stores the duration as a single variable called millis (i.e., milliseconds) instead of separate d,h,m,ms or h,m,ms. Why this would be coded in that way does not make any sense to me." You would be very unhappy with the performance of an implementation using separate components. It would be doing modulo arithmetic all over.
  • "And speaking of the datetime type, IMHO the internal variable should have been millis or nanos instead of seconds." Not sure why you're tinking seconds. datetime does NOT store time in units of seconds, it uses units of milliseconds. J2000 is represented exactly. You may have created it suboptimally (by calling the constructor using 55.816 seconds, instead of 55s, 816ms), but that's a different story. Is there something somewhere in the doc that is misleading you?
  4 Commenti
James Tursa
James Tursa il 15 Mag 2024
Modificato: James Tursa il 16 Mag 2024
I actually laughed out loud when I discovered this. So looking at this post (I wish I had seen it sooner):
it appears you pretty much do what I was hoping, storing the "working" data in the background as millis. In fact, it looks like it is kind of like the old Serial Date Number scheme, but this time it is serial date millis with an "extra precision" imaginary part. I am actually quite pleased to discover this, and withdraw my objection to how datetimes are stored, but not my objection to duration storage ...
TMW went to all the trouble of storing "precision" times (for lack of a better description) in the background for datetimes. That's the good part. But IMO the ball was dropped on the 1-yard line because of the duration thing. What good does it do to store all that precision in the datetimes if it immediately gets wiped out when you compute a time difference via datetime subtraction? Or if the precision gets truncated when building a duration type from scratch? I do datetime differences all the time in my time scale conversion code, but when I found out that precision gets lost as soon as I do these calculations it prompted my original post. Which leads to this:
"... do you have any real uses where you need that kind of precision over that long a range? I'm genuinely asking ..."
I believe I already answered this in my original post, but I will reiterate here. It is not a question how much precision you think the user needs. That's guesswork at best (who knows what their application is?) and is not the point. It is a question of class design. Durations are so intimately tied to datetimes because of the datetime subtraction operation and combining datetimes with durations, IMO durations should have retained the precision to match. The fact that it doesn't leads to annoying differences that could have been avoided. (Are there any MATLAB supplied functions that can do a precision difference?)
"... You would be very unhappy with the performance of an implementation using separate components. It would be doing modulo arithmetic all over ..."
Again, you are presuming what the user will be happy or not happy with. I know I am unhappy with the precision loss in datetime subtraction. I might be more unhappy with the timing performance of an implementation using separate components, but until I saw the comparisons I couldn't say. Certainly I would be willing to give up some timing performance to get it. Since you already store the datetimes in the background as millis + extra precision why couldn't you have just employed the same scheme for durations? Clearly this was fast enough for datetime calculations, why would it have been unreasonably slower for duration calculations?
If it was just writing code for my own purposes, I could write my own subtraction code off to the side to retain the precision as much as possible ... maybe even invent my own duration type class for this that keeps the duration in two pieces like MATLAB is doing or maybe using seconds + ns. So yes, I can code workarounds for my own use. But I am writing code that will be used by other users, and those other users are likely going to be doing datetime differences in their own code. It would have been nice if this naturally didn't result in precision loss.
Finally, calendarDuration is not useful for my purposes because it is a fuzzy duration that morphs into different actual durations depending on the dates in use ... see my reply to @Stephen23.
James Tursa
James Tursa il 15 Mag 2024
Modificato: James Tursa il 15 Mag 2024
Here is an example of this annoyance:
format longg
gps_epoch = datetime(1980,1,6,'TimeZone','UTCLeapSeconds')
gps_epoch = datetime
1980-01-06T00:00:00.000Z
dt1 = datetime(2024,1,7,0,0,1.2345678912345,'TimeZone','UTCLeapSeconds') % a nearby date
dt1 = datetime
2024-01-07T00:00:01.234Z
dt2 = dt1 + 70000 % a date in the not-too-distant future
dt2 = datetime
2215-09-03T00:00:01.234Z
You can see that the seconds still has the precision in both of these:
dt1.Second
ans =
1.2345678912345
dt2.Second
ans =
1.2345678912345
To calculate week + seconds_of_week GPS time for these, a naive approach gives annoying results:
[h1,m1,s1] = hms(dt1 - gps_epoch); % precision is lost here!
w1 = floor(h1/(7*24))
w1 =
2296
s1 = (h1 - w1*7*24)*3600 + m1*60 + s1
s1 =
19.2345678806305
[h2,m2,s2] = hms(dt2 - gps_epoch); % precision is lost here!
w2 = floor(h2/(7*24))
w2 =
12296
s2 = (h2 - w2*7*24)*3600 + m2*60 + s2
s2 =
19.2345685958862
The trailing digits in the s1 and s2 tell the precision loss story. The trailing digits are not preserved, and they give different results depending on how far away you are from the GPS epoch. To preserve the precision, you have to do the subtraction differently. E.g., (ignoring the spillover that might happen in the seconds_of_week calculation for simplicity here)
DT1 = dt1;
DT1.Second = 0; % a copy with the seconds removed
[h1,m1,s1] = hms(DT1 - gps_epoch);
w1 = floor(h1/(7*24))
w1 =
2296
s1 = (h1 - w1*7*24)*3600 + m1*60 + s1 + dt1.Second % add back the original seconds
s1 =
19.2345678912345
DT2 = dt2;
DT2.Second = 0; % a copy with the seconds removed
[h2,m2,s2] = hms(DT2 - gps_epoch);
w2 = floor(h2/(7*24))
w2 =
12296
s2 = (h2 - w2*7*24)*3600 + m2*60 + s2 + dt2.Second % add back the original seconds
s2 =
19.2345678912345
Now you get consistent results with regards to precision, but it is annoying to have to do this. Do I need that extra precision in the calculation? Not the point. Who knows what my application is? Maybe I do need it. The point is that I can't do just natural datetime difference calculations and keep the precision because of the duration issue ... I have to jump through hoops with special code to get the more precise result.

Accedi per commentare.

Prodotti


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by