Strange timer-processing problem (inaccuracy) under QNX 4.25

We experienced a very strange problem seems to be related to
timer-processing under QNX 4.25D (specifically in Proc32 4.25J !!!).

(I’ve already filled a problem report form on the QDN website, but
there was no feedback, so I try to post it here (with more details),
too…)

  • Problem summary:

Under Proc32 4.25J it can frequently happen, that a periodic timer
using proxy notification will expire after (cycle-interval-time +/-
ticksize), and not after - the required - (cycle-interval-time)!

The (+/-ticksize) differences will be balanced by each other, almost
immediately, so the average expiration time will be the required
value, but the rate of occuring +/- differences is quite high: in
some of the tests it had reached the 20%/20% value!

An important fact, that if - after starting the owner application
of the timer - we explicitly set ticksize to a lower value, and then
back to the original one, the +/- differences disappear, the
affected timer will expire accurately with the required
cycle-interval-time…

[Is it a bug or a feature???]

  • Environment:

  • Hardware (does not seem to be relevant):

Various Intel chipset (TX,ZX) based PCs (+ a notbook) with Intel
Pentium MMX & Celeron processors. (I can provide further details
on request…)

  • Software:

  • QNX 4.25D (Proc32: 4.25J)

  • Photon 1.14A

  • Watcom C/C++ 10.6B

  • Problem details:

There is a timer attached (as a periodic wakeup-timer) in a high priority
(-> 27/28) process, with a 20ms cycle interval, using proxy based
notification. (The ticksize is 2ms.)

[Normally the owner process can complete its periodic activity
within the cycle time of its wakeup timer (20ms).]

The problem is, that it can frequently happen, that the timer will
expire after 18ms or 22ms (cycle-time +/- ticksize) and not after
20ms (the required cycle-time).

The +/- differences will be balanced/corrected by each other, a -(+)
difference generally will be immediately followed by a +(-)
difference, but the rate of abnormal expiration times is quite high.

Testing method: I used ‘monitor’ for testing (priority: 29), and I
compared the timestamps of following lines:

" proxy(()) triggers ()"

[-> where the proxy seems to be triggered…]

and not the lines:

" () recv … from proxy(())"

[-> where the owner of the timer finally seems to receive the msg
of the proxy; it can be delayed…]

Some notes:

  • Running with different cycle times (8 or 10ms) identical results
    will be produced.

  • An other timer with the same cycle-interval-time in an other
    process (running as a part of the same application) shows
    identical symptoms, so we can not say that the problem affects
    only a specific timer of a specific process.

  • Priority of Proc32: 30 (default).

  • The -F option of Proc32 is NOT used, so the frequency of the 8254
    clock is the default.

  • Priority of Fsys.eide was reduced from 22 to 11.

  • Possible fixes:

  • Stepping back to Proc32 4.25I the problem disappears, the affected
    timer expires accurately.

  • Running under Proc32 4.25J, if - after starting all of the
    component processes - we explicitly set ticksize to a lower value
    (e.g. to 1ms) and then back to 2ms, the problem seems to
    disappear! (It’s really strange, but it happens…)

By stopping the whole application and then restarting it, the
problem occurs again, but a repeated ‘ticksize 1; ticksize 2’
command sequence helps again…

[Each of the possible solutions can be verified by the ‘monitor’
utility: no more +/-ticksize differences can be found in the
timestamps of 'proxy … triggers … ’ lines!!!]

[On request I can send detailed test results (->monitor-outputs)…]

Good luck in investigation,
Gyorgy Tamasi (cosy-software@freemail.hu)
COSY Software / Schoenenberger Systeme GmbH

Are you sure that the timer isn’t going off at the correct time but at that
time you were pre-empted by a higher priority process?
This is most likely.

Even so, the timer isn’t guaranteed to go off AT the scheduled time. It is
guaranteed to go off AT OR AFTER the scheduled time. Again, this is because
of other processes at higher priorities.

Try adjusting the priorities of your processes.


Tamasi Gyorgy <gtamasi@freemail.hu> wrote in message
news:Voyager.010108174025.6985D@server…

We experienced a very strange problem seems to be related to
timer-processing under QNX 4.25D (specifically in Proc32 4.25J !!!).

(I’ve already filled a problem report form on the QDN website, but
there was no feedback, so I try to post it here (with more details),
too…)

  • Problem summary:

Under Proc32 4.25J it can frequently happen, that a periodic timer
using proxy notification will expire after (cycle-interval-time +/-
ticksize), and not after - the required - (cycle-interval-time)!

The (+/-ticksize) differences will be balanced by each other, almost
immediately, so the average expiration time will be the required
value, but the rate of occuring +/- differences is quite high: in
some of the tests it had reached the 20%/20% value!

An important fact, that if - after starting the owner application
of the timer - we explicitly set ticksize to a lower value, and then
back to the original one, the +/- differences disappear, the
affected timer will expire accurately with the required
cycle-interval-time…

[Is it a bug or a feature???]

  • Environment:

  • Hardware (does not seem to be relevant):

Various Intel chipset (TX,ZX) based PCs (+ a notbook) with Intel
Pentium MMX & Celeron processors. (I can provide further details
on request…)

  • Software:

  • QNX 4.25D (Proc32: 4.25J)

  • Photon 1.14A

  • Watcom C/C++ 10.6B

  • Problem details:

There is a timer attached (as a periodic wakeup-timer) in a high
priority
(-> 27/28) process, with a 20ms cycle interval, using proxy based
notification. (The ticksize is 2ms.)

[Normally the owner process can complete its periodic activity
within the cycle time of its wakeup timer (20ms).]

The problem is, that it can frequently happen, that the timer will
expire after 18ms or 22ms (cycle-time +/- ticksize) and not after
20ms (the required cycle-time).

The +/- differences will be balanced/corrected by each other, a -(+)
difference generally will be immediately followed by a +(-)
difference, but the rate of abnormal expiration times is quite high.

Testing method: I used ‘monitor’ for testing (priority: 29), and I
compared the timestamps of following lines:

" proxy(()) triggers ()"

[-> where the proxy seems to be triggered…]

and not the lines:

" () recv … from proxy(())"

[-> where the owner of the timer finally seems to receive the msg
of the proxy; it can be delayed…]

Some notes:

  • Running with different cycle times (8 or 10ms) identical results
    will be produced.

  • An other timer with the same cycle-interval-time in an other
    process (running as a part of the same application) shows
    identical symptoms, so we can not say that the problem affects
    only a specific timer of a specific process.

  • Priority of Proc32: 30 (default).

  • The -F option of Proc32 is NOT used, so the frequency of the 8254
    clock is the default.

  • Priority of Fsys.eide was reduced from 22 to 11.

  • Possible fixes:

  • Stepping back to Proc32 4.25I the problem disappears, the affected
    timer expires accurately.

  • Running under Proc32 4.25J, if - after starting all of the
    component processes - we explicitly set ticksize to a lower value
    (e.g. to 1ms) and then back to 2ms, the problem seems to
    disappear! (It’s really strange, but it happens…)

By stopping the whole application and then restarting it, the
problem occurs again, but a repeated ‘ticksize 1; ticksize 2’
command sequence helps again…

[Each of the possible solutions can be verified by the ‘monitor’
utility: no more +/-ticksize differences can be found in the
timestamps of 'proxy … triggers … ’ lines!!!]

[On request I can send detailed test results (->monitor-outputs)…]

Good luck in investigation,
Gyorgy Tamasi (> cosy-software@freemail.hu> )
COSY Software / Schoenenberger Systeme GmbH

Previously, Bill at Sierra Design wrote in comp.os.qnx:

Are you sure that the timer isn’t going off at the correct time but at that
time you were pre-empted by a higher priority process?
This is most likely.

Even so, the timer isn’t guaranteed to go off AT the scheduled time. It is
guaranteed to go off AT OR AFTER the scheduled time. Again, this is because
of other processes at higher priorities.

Try adjusting the priorities of your processes.

The strange thing is, that the problem had not ever happened under a Proc32
older than 4.25J [published in the QNX4.25C patch] (and the affected software
runs since years on different locations), and it happens constantly under
Proc32 4.25J: a significant percent of timer-proxy triggering comes at cycle
+/- ticksize (200/200 of a sample with ~1000 timer expirations). (And what
about the (ticksize 1; ticksize 2) solution??? Why is it a solution??? What is
the difference between Proc32 4.25I & 4.25J???)

I think we should concentrate on the lines in the monitor-log:

" proxy(()) triggers ()"

These are the monitor-log entries, where I see the cycle +/- ticksize
differences in timestamps compared with the ‘expected’ timestamps (in other
words: “this is my strange problem”; and this does not exist under Proc32
<=4.25I, or after executing the ticksize-setting fix). (And this entry is
different, than the entry coming later, where the owner process really detects
the expiration of its timer by receiving the msg of the proxy!)

Just as an example, here is some subsequent timestamps for the entry mentioned
above:

[in nanoseconds (calculated based on RDTSC-cycles of monitor-log); relative
times between entries; expected cycle: 8ms; abnormal entries with +/- marks]

8002732
6010508 -
9975614 +
8011035
7988795
6002857 -
7988923
10015715 +
7988811
8012627
7985902
5996243 -
10009333 +
7996662
8006173
8004937
5983963 -
9999990 +
7993121
8004821
5992210 -
7994293
10007836 +
7995369

Which system component is the author of this entry, and what does it mean?
If it means (‘atomically’), that “Proc32 detected the expiration of the timer,
and it triggers the notification-proxy assigned to the timer, and it tries to
send its canned message to the owner process” (so Proc32 is the author of the
entry!), then we should say, that it does not make sense to analyze the
priority of the application components, because my highest priority process is
Proc32 itself (prio: 30).

If the entry means something different, I would gladly hear the description…

One important note: in my test ‘monitor’ was running without logging any
interrupts. I will try to run it with requesting logs for some interrupts (0,
8?), too, hoping that more info will be produced…

Tamasi Gyorgy <> gtamasi@freemail.hu> > wrote in message
(skip)

  • Problem summary:

Under Proc32 4.25J it can frequently happen, that a periodic timer
using proxy notification will expire after (cycle-interval-time +/-
ticksize), and not after - the required - (cycle-interval-time)!

The (+/-ticksize) differences will be balanced by each other, almost
immediately, so the average expiration time will be the required
value, but the rate of occuring +/- differences is quite high: in
some of the tests it had reached the 20%/20% value!

An important fact, that if - after starting the owner application
of the timer - we explicitly set ticksize to a lower value, and then
back to the original one, the +/- differences disappear, the
affected timer will expire accurately with the required
cycle-interval-time…
(skip)

  • Problem details:

There is a timer attached (as a periodic wakeup-timer) in a high
priority
(-> 27/28) process, with a 20ms cycle interval, using proxy based
notification. (The ticksize is 2ms.)

[Normally the owner process can complete its periodic activity
within the cycle time of its wakeup timer (20ms).]

The problem is, that it can frequently happen, that the timer will
expire after 18ms or 22ms (cycle-time +/- ticksize) and not after
20ms (the required cycle-time).

The +/- differences will be balanced/corrected by each other, a -(+)
difference generally will be immediately followed by a +(-)
difference, but the rate of abnormal expiration times is quite high.

Testing method: I used ‘monitor’ for testing (priority: 29), and I
compared the timestamps of following lines:

" proxy(()) triggers ()"

[-> where the proxy seems to be triggered…]

and not the lines:

" () recv … from proxy(())"

[-> where the owner of the timer finally seems to receive the msg
of the proxy; it can be delayed…]

Some notes:

  • Running with different cycle times (8 or 10ms) identical results
    will be produced.

  • An other timer with the same cycle-interval-time in an other
    process (running as a part of the same application) shows
    identical symptoms, so we can not say that the problem affects
    only a specific timer of a specific process.

  • Priority of Proc32: 30 (default).

  • The -F option of Proc32 is NOT used, so the frequency of the 8254
    clock is the default.

  • Priority of Fsys.eide was reduced from 22 to 11.

  • Possible fixes:

  • Stepping back to Proc32 4.25I the problem disappears, the affected
    timer expires accurately.

  • Running under Proc32 4.25J, if - after starting all of the
    component processes - we explicitly set ticksize to a lower value
    (e.g. to 1ms) and then back to 2ms, the problem seems to
    disappear! (It’s really strange, but it happens…)

By stopping the whole application and then restarting it, the
problem occurs again, but a repeated ‘ticksize 1; ticksize 2’
command sequence helps again…

[Each of the possible solutions can be verified by the ‘monitor’
utility: no more +/-ticksize differences can be found in the
timestamps of 'proxy … triggers … ’ lines!!!]

[On request I can send detailed test results (->monitor-outputs)…]

Good luck in investigation,
Gyorgy Tamasi (> cosy-software@freemail.hu> )
COSY Software / Schoenenberger Systeme GmbH

The discussion of this thread will be continued in qdn.public.qnx4 (@inn.qnx.com),
with the same thread-title…

Sorry about parallel cross posting, but feeling lost in space peoples
generally tries to use as much communication channels as possible…

The solution already is not so far (I promise)…