Timer quantization error

Evan_Hillas1 · August 27, 2007, 11:19pm

David Gibbs wrote:

Evan Hillas <> evanh@clear.net.nz> > wrote:
Maybe a change to CLOCK_MONOTIC is in order where it will not perform
any accumulating compensation.

CLOCK_MONOTONIC doesn’t do any accumulating compensation. The compensation
is done on the boot time.

That is an additional correction that doesn’t affect CLOCK_MONOTIC, it’s not the one I’m interested in.

The app’s request gets rounded to the
nearest integral interrupt period and stays that way until it’s stopped
or the interrupt period is adjusted.

I don’t think we can do that, round the applications request that way. If
the application wants rounding, it needs to do the rounding itself.

Let me reword that one. I meant have the OS choose the closest IRQ period based on rounding, which it already does, then no subsequent accumulation applied, which it currently does do to match the app’s real time request.

Currently the Posix based timer mechanism tries to fit a request over many events by accumulating the error for each event and at fairly regular intervals one extra or one less IRQ interval is applied to match the calculated real time of the app’s request. This method creates a “beat” as described in the tick-tock articles.

Of course the OS can round off the request. My concern has always been that, left to the application, there is no guarantee for the applications that their rounding will always match the OS’s internal IRQ interval constant(s).

So, the idea I had above, is add a mechanism that forces the request to the nearest integral of the IRQ interval.

But, as noted below, any such reliance on this level of regularity is doomed if subsequent ClockPeriod() adjustments are made so I don’t think adding this feature is a good idea any longer. Rather, I recommend adding a note to the timer documentation suggesting, that if such regularity is needed, of using alternative hardware timers and specialised hardware samplers/servo cards.

However, this idea is not ideal either. When it comes to interrupt
period adjustment such sampling systems are screwed unless the new period
is lucky enough to land on a multiple of the integral.

Systems should NOT be adjusting the interrupt period on the fly. It should
be set once, as a system design consideration, shortly after boot time – and
before anything that will depend on it configures – and never be changed.

Really? Any root access program can change it at will. I would think that there is some sort of attempt by the OS to keep track of real time during such changes.

Evan

David_Gibbs1 · August 29, 2007, 4:50pm

Evan Hillas <evanh@clear.net.nz> wrote:

David Gibbs wrote:
Evan Hillas <> evanh@clear.net.nz> > wrote:
Maybe a change to CLOCK_MONOTIC is in order where it will not perform
any accumulating compensation.

CLOCK_MONOTONIC doesn’t do any accumulating compensation. The compensation
is done on the boot time.

That is an additional correction that doesn’t affect CLOCK_MONOTIC,
it’s not the one I’m interested in.

The app’s request gets rounded to the
nearest integral interrupt period and stays that way until it’s stopped
or the interrupt period is adjusted.

I don’t think we can do that, round the applications request that way. If
the application wants rounding, it needs to do the rounding itself.

Let me reword that one. I meant have the OS choose the closest IRQ
period based on rounding, which it already does, then no subsequent
accumulation applied, which it currently does do to match the app’s
real time request.

We don’t accumulate on an application’s behalf. We record the
applications request as exactly as possible. Then, we just move
current time forward by the clock period, and on each tick, ask,
“is current time > next expiry application asked for”, and if yes,
notify the client. If it is a repeating timer, we then add
exactly what the client asked for to the client’s next expiry.

You seem to be suggesting that we should modify the “amount to
add” to the app’s expiry from what the application asked for to
some other value.

We shouldn’t do that. It is up to the client application to choose
that value, not the OS, and if the client chooses a bad value – one
that doesn’t align well with the amount the OS adds as its clock
period – the application will have bad behaviour.

Currently the Posix based timer mechanism tries to fit a request over
many events by accumulating the error for each event and at fairly
regular intervals one extra or one less IRQ interval is applied to
match the calculated real time of the app’s request. This method
creates a “beat” as described in the tick-tock articles.

Hm… I guess it might look like that, though that isn’t the implementation.

Of course the OS can round off the request. My concern has always been
that, left to the application, there is no guarantee for the applications
that their rounding will always match the OS’s internal IRQ interval
constant(s).

The application can query the OS’s internal IRQ interval by calling
ClockPeriod(), and make sure their request rounds cleanly to this.

So, the idea I had above, is add a mechanism that forces the request to
the nearest integral of the IRQ interval.

And, I repeat that this is up to the application to do, should the
application need this.

However, this idea is not ideal either. When it comes to interrupt
period adjustment such sampling systems are screwed unless the new period
is lucky enough to land on a multiple of the integral.

Systems should NOT be adjusting the interrupt period on the fly. It should
be set once, as a system design consideration, shortly after boot time – and
before anything that will depend on it configures – and never be changed.

Really? Any root access program can change it at will. I would think
that there is some sort of attempt by the OS to keep track of real
time during such changes.

And, any root process can disable interrupts for 15 minutes. Any root
process can send a SIGKILL signal to every other process in the system.
Any root process can map all of physical memory and start scribbling
over it.

Just because a root process CAN do something does not mean that it is
a good system design decision for a root process to do so.

Yes, if you modify the clock period, things will work fine from the
OS point of view. Our mechanism for handling time, and timers, really
doesn’t care if you twitch it over and over again.

But, most systems that have any dependency on the system clock will
decide, as a design decision, what trade-off they want between overhead
and precision in the clock, and set that up as a system design decision,
implement the appropriate change (if any) shortly after boot, and leave
the system that way. Then, if someone does need to run without beat/
accumulated error problems, they can safely query the system value for
this and use that, confident that it won’t change.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Evan_Hillas1 · August 29, 2007, 9:56pm

David Gibbs wrote:

notify the client. If it is a repeating timer, we then add
exactly what the client asked for to the client’s next expiry.

And the OS also subtracts the amount that has passed. It doesn’t just go “lets start afresh” for the next event. This add-subtract dual action is the accumulation I’m talking about.

You seem to be suggesting that we should modify the “amount to
add” to the app’s expiry from what the application asked for to
some other value.

Yep. As in add a flag that tells the OS the nearest integral is desired.

Currently the Posix based timer mechanism tries to fit a request over
many events by accumulating the error for each event and at fairly
regular intervals one extra or one less IRQ interval is applied to
match the calculated real time of the app’s request. This method
creates a “beat” as described in the tick-tock articles.

Hm… I guess it might look like that, though that isn’t the implementation.

The above add-subtract action is this implementation. It naturally accumulates the error between the IRQ intervals and the requested period until there is a rollover that causes an additional IRQ tick to occur, creating the “beat”.

Of course the OS can round off the request. My concern has always been
that, left to the application, there is no guarantee for the applications
that their rounding will always match the OS’s internal IRQ interval
constant(s).

The application can query the OS’s internal IRQ interval by calling
ClockPeriod(), and make sure their request rounds cleanly to this.

You hope. Just have to make sure that the whole struct is passed around and not just the nanosecond value.

So, the idea I had above, is add a mechanism that forces the request to
the nearest integral of the IRQ interval.

And, I repeat that this is up to the application to do, should the
application need this.

And I repeat that the docs aren’t giving the guarantee needed for future proofing. And until CLOCK_MONOTIC was implemented this method never worked anyway. So earlier postings some years ago were wrong.

Yes, if you modify the clock period, things will work fine from the
OS point of view. Our mechanism for handling time, and timers, really
doesn’t care if you twitch it over and over again.

Good, because that is all that matters for Posix timers. They are not designed for sampling. A system designer should be using other hardware for this purpose.

I’m happy with the way QNX is already. I had thought otherwise.

Evan

David_Gibbs1 · August 30, 2007, 5:18pm

Evan Hillas <evanh@clear.net.nz> wrote:

David Gibbs wrote:
notify the client. If it is a repeating timer, we then add
exactly what the client asked for to the client’s next expiry.

And the OS also subtracts the amount that has passed. It doesn’t just
go “lets start afresh” for the next event. This add-subtract dual
action is the accumulation I’m talking about.

Quick pseudo-code of what the OS does on a timer interrupt:

Adds one tick to current time.
For each timer
If current time > timer_expiry
Notify application
If timer is repeat
Add timer increment to timer_expiry
Insert timer back into list
Else
Remove timer from list
Else
Break out of loop

Nowhere does the OS subtract time from the timer expiry.

But, if “one tick” and “timer_increment” are close to each other
in value, but timer_increment is not a multiple of “one tick”, then
you could see behaviour that might make it look like we’re subtracting
bits. We aren’t.

You seem to be suggesting that we should modify the “amount to
add” to the app’s expiry from what the application asked for to
some other value.

Yep. As in add a flag that tells the OS the nearest integral is desired.

Won’t happen on POSIX timers – isn’t in the POSIX spec.

Unlikely to happen on the kernel call, we aren’t going to change something
that low level and tested for something like this.

Could be done as an added library routine – but, again, seems unlikely
as it could be easily coded by any customer that needs it.

Currently the Posix based timer mechanism tries to fit a request over
many events by accumulating the error for each event and at fairly
regular intervals one extra or one less IRQ interval is applied to
match the calculated real time of the app’s request. This method
creates a “beat” as described in the tick-tock articles.

Hm… I guess it might look like that, though that isn’t the implementation.

The above add-subtract action is this implementation. It naturally
accumulates the error between the IRQ intervals and the requested period
until there is a rollover that causes an additional IRQ tick to occur,
creating the “beat”.

Actually, it is an “add-add” operation – there is no substract. But,
yes, since the values added are different, the difference between the
added values does accumulate.

The application can query the OS’s internal IRQ interval by calling
ClockPeriod(), and make sure their request rounds cleanly to this.

You hope. Just have to make sure that the whole struct is passed
around and not just the nanosecond value.

ClockPeriod() only gives a nanosecond value. (Well, it gives a
fraction of a nanosecond, too, but that’s reserved for future use.)
So, I’m not sure what you mean there.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Evan_Hillas1 · August 30, 2007, 10:16pm

David Gibbs wrote:

Nowhere does the OS subtract time from the timer expiry.

But, if “one tick” and “timer_increment” are close to each other
in value, but timer_increment is not a multiple of “one tick”, then
you could see behaviour that might make it look like we’re subtracting
bits. We aren’t.

Heh, ok, so I don’t have the sources. It’s still the same accumulating result. I just didn’t think hard enough about the simplicity of it.

Each occurrence of “Add timer increment to timer_expiry” accumulates any excess nanoseconds by the fact that it doesn’t restart the expiry period from the time of the current interrupt. Each time this happens there is a shift in the alignment until an effective rollover into the next interrupt event occurs.

Unlikely to happen on the kernel call, we aren’t going to change something
that low level and tested for something like this.

Could be done as an added library routine – but, again, seems unlikely
as it could be easily coded by any customer that needs it.

Need to make the docs firmer before leaving it open to the app. As for a lib, that works, as long as it’s part of the standard library set, because it restricts the bug to one line of code and basically defines the what the correct method is anyway.

ClockPeriod() only gives a nanosecond value. (Well, it gives a
fraction of a nanosecond, too, but that’s reserved for future use.)
So, I’m not sure what you mean there.

Exactly. It’s not just not a well defined method, it even has reservation for changes.

Evan

Evan_Hillas1 · August 30, 2007, 10:21pm

Evan Hillas wrote:

David Gibbs wrote:
ClockPeriod() only gives a nanosecond value. (Well, it gives a
fraction of a nanosecond, too, but that’s reserved for future use.)
So, I’m not sure what you mean there.

Exactly. It’s not just not a well defined method, it even has reservation for changes.

And, btw, another reminder, didn’t even work until recently. Those that don’t have the latest kernel installed can’t rely on the OS for this feature.

Evan

Evan_Hillas1 · August 31, 2007, 9:00am

David Gibbs wrote:

Quick pseudo-code of what the OS does on a timer interrupt:

Adds one tick to current time.
For each timer
If current time > timer_expiry
Notify application
If timer is repeat
Add timer increment to timer_expiry
Insert timer back into list
Else
Remove timer from list
Else
Break out of loop

I use the following variation for my timers/counters/measurements:

Adds one tick to current time
For each timer
If current_time - previous_event_time > timer_preset
Notify application
If timer is repeat
Copy current_time into previous_event_time
Insert timer back into list
Else
Remove timer from list
Else
Break out of loop

Evan_Hillas1 · September 2, 2007, 2:00pm

Ah, crap, there is all sorts of variations depending on function details. The one I meant to do was equivalent to the existing QNX method, just that it also handles integer rollover gracefully, ie:

Adds one tick to current time
For each timer
If current_time - previous_event_time > timer_preset
Notify application
If timer is repeat
Add timer_preset to previous_event_time
Insert timer back into list
Else
Remove timer from list
Else
Break out of loop

Cheers,
Evan