Strange timer-processing problem (inaccuracy) under QNX 4.25

We experienced a very strange problem seems to be related to
timer-processing under QNX 4.25D (specifically in Proc32 4.25J !!!).

(I’ve already filled a problem report form on the QDN website, but
there was no feedback, so I try to post it here (with more details),
too…)

  • Problem summary:

Under Proc32 4.25J it can frequently happen, that a periodic timer
using proxy notification will expire after (cycle-interval-time +/-
ticksize), and not after - the required - (cycle-interval-time)!

The (+/-ticksize) differences will be balanced by each other, almost
immediately, so the average expiration time will be the required
value, but the rate of occuring +/- differences is quite high: in
some of the tests it had reached the 20%/20% value!

An important fact, that if - after starting the owner application
of the timer - we explicitly set ticksize to a lower value, and then
back to the original one, the +/- differences disappear, the
affected timer will expire accurately with the required
cycle-interval-time…

[Is it a bug or a feature???]

  • Environment:

  • Hardware (does not seem to be relevant):

Various Intel chipset (TX,ZX) based PCs (+ a notbook) with Intel
Pentium MMX & Celeron processors. (I can provide further details
on request…)

  • Software:

  • QNX 4.25D (Proc32: 4.25J)

  • Photon 1.14A

  • Watcom C/C++ 10.6B

  • Problem details:

There is a timer attached (as a periodic wakeup-timer) in a high priority
(-> 27/28) process, with a 20ms cycle interval, using proxy based
notification. (The ticksize is 2ms.)

[Normally the owner process can complete its periodic activity
within the cycle time of its wakeup timer (20ms).]

The problem is, that it can frequently happen, that the timer will
expire after 18ms or 22ms (cycle-time +/- ticksize) and not after
20ms (the required cycle-time).

The +/- differences will be balanced/corrected by each other, a -(+)
difference generally will be immediately followed by a +(-)
difference, but the rate of abnormal expiration times is quite high.

Testing method: I used ‘monitor’ for testing (priority: 29), and I
compared the timestamps of following lines:

" proxy(()) triggers ()"

[-> where the proxy seems to be triggered…]

and not the lines:

" () recv … from proxy(())"

[-> where the owner of the timer finally seems to receive the msg
of the proxy; it can be delayed…]

Some notes:

  • Running with different cycle times (8 or 10ms) identical results
    will be produced.

  • An other timer with the same cycle-interval-time in an other
    process (running as a part of the same application) shows
    identical symptoms, so we can not say that the problem affects
    only a specific timer of a specific process.

  • Priority of Proc32: 30 (default).

  • The -F option of Proc32 is NOT used, so the frequency of the 8254
    clock is the default.

  • Priority of Fsys.eide was reduced from 22 to 11.

  • Possible fixes:

  • Stepping back to Proc32 4.25I the problem disappears, the affected
    timer expires accurately.

  • Running under Proc32 4.25J, if - after starting all of the
    component processes - we explicitly set ticksize to a lower value
    (e.g. to 1ms) and then back to 2ms, the problem seems to
    disappear! (It’s really strange, but it happens…)

By stopping the whole application and then restarting it, the
problem occurs again, but a repeated ‘ticksize 1; ticksize 2’
command sequence helps again…

[Each of the possible solutions can be verified by the ‘monitor’
utility: no more +/-ticksize differences can be found in the
timestamps of 'proxy … triggers … ’ lines!!!]

[On request I can send detailed test results (->monitor-outputs, etc.)…]

Good luck in investigation,
Gyorgy Tamasi (cosy-software@freemail.hu)
COSY Software / Schoenenberger Systeme GmbH

Not sure if it helps, and I don’t know of the specific versions involved, but
here is the following anecdote:

At the QNX2000 conference last year it was noted by somebody (and accepted by a
QSSL representative) that there were problems when ticksize, a timer and the
hardware clock were not all integral multiples of each other. You might expect
that the timer would expire at the nearest ticksize multiple buit in fact that
is not the case. Apparently (an I have not checked this myself, and it may be
subject to which version you try) you get either a series of short periods,
followed by a missed tick, or a series of long periods followed by 2 ticks.
Over time, the expected number of ticks arrives, but the timing is not what you
expect.

I suspect what you have discovered is either a version of the above problem, or
an attempt at fixing it.

Tamasi Gyorgy wrote:

We experienced a very strange problem seems to be related to
timer-processing under QNX 4.25D (specifically in Proc32 4.25J !!!).

(I’ve already filled a problem report form on the QDN website, but
there was no feedback, so I try to post it here (with more details),
too…)

  • Problem summary:

Under Proc32 4.25J it can frequently happen, that a periodic timer
using proxy notification will expire after (cycle-interval-time +/-
ticksize), and not after - the required - (cycle-interval-time)!

The (+/-ticksize) differences will be balanced by each other, almost
immediately, so the average expiration time will be the required
value, but the rate of occuring +/- differences is quite high: in
some of the tests it had reached the 20%/20% value!

An important fact, that if - after starting the owner application
of the timer - we explicitly set ticksize to a lower value, and then
back to the original one, the +/- differences disappear, the
affected timer will expire accurately with the required
cycle-interval-time…

[Is it a bug or a feature???]

  • Environment:

  • Hardware (does not seem to be relevant):

Various Intel chipset (TX,ZX) based PCs (+ a notbook) with Intel
Pentium MMX & Celeron processors. (I can provide further details
on request…)

  • Software:

  • QNX 4.25D (Proc32: 4.25J)

  • Photon 1.14A

  • Watcom C/C++ 10.6B

  • Problem details:

There is a timer attached (as a periodic wakeup-timer) in a high priority
(-> 27/28) process, with a 20ms cycle interval, using proxy based
notification. (The ticksize is 2ms.)

[Normally the owner process can complete its periodic activity
within the cycle time of its wakeup timer (20ms).]

The problem is, that it can frequently happen, that the timer will
expire after 18ms or 22ms (cycle-time +/- ticksize) and not after
20ms (the required cycle-time).

The +/- differences will be balanced/corrected by each other, a -(+)
difference generally will be immediately followed by a +(-)
difference, but the rate of abnormal expiration times is quite high.

Testing method: I used ‘monitor’ for testing (priority: 29), and I
compared the timestamps of following lines:

" proxy(()) triggers ()"

[-> where the proxy seems to be triggered…]

and not the lines:

" () recv … from proxy(())"

[-> where the owner of the timer finally seems to receive the msg
of the proxy; it can be delayed…]

Some notes:

  • Running with different cycle times (8 or 10ms) identical results
    will be produced.

  • An other timer with the same cycle-interval-time in an other
    process (running as a part of the same application) shows
    identical symptoms, so we can not say that the problem affects
    only a specific timer of a specific process.

  • Priority of Proc32: 30 (default).

  • The -F option of Proc32 is NOT used, so the frequency of the 8254
    clock is the default.

  • Priority of Fsys.eide was reduced from 22 to 11.

  • Possible fixes:

  • Stepping back to Proc32 4.25I the problem disappears, the affected
    timer expires accurately.

  • Running under Proc32 4.25J, if - after starting all of the
    component processes - we explicitly set ticksize to a lower value
    (e.g. to 1ms) and then back to 2ms, the problem seems to
    disappear! (It’s really strange, but it happens…)

By stopping the whole application and then restarting it, the
problem occurs again, but a repeated ‘ticksize 1; ticksize 2’
command sequence helps again…

[Each of the possible solutions can be verified by the ‘monitor’
utility: no more +/-ticksize differences can be found in the
timestamps of 'proxy … triggers … ’ lines!!!]

[On request I can send detailed test results (->monitor-outputs, etc.)…]

Good luck in investigation,
Gyorgy Tamasi (> cosy-software@freemail.hu> )
COSY Software / Schoenenberger Systeme GmbH

Hum seems my post didn’t make it (unless I’m confuse by cross posting)
Please read the following two articles:

http://qdn.qnx.com/articles/oct2300/quantization.html
http://qdn.qnx.com/articles/oct3100/concept_of_time.html

They are for NTO but apply to QNX4.

“Tamasi Gyorgy” <gtamasi@freemail.hu> wrote in message
news:Voyager.010109010839.28567B@server…

I would concentrate on the following facts:

  • Proc32 4.25J produces the problem with 2ms ticksize, but after executing
    a
    ‘ticksize 1; ticksize 2’ sequence, the problem disappears! Why?

  • Proc32 4.25I DOES NOT produce the problem. Why?

(‘Why?’ questions would be addressed to an in-house Proc32 expert - at
QSSL… Is there any - source-level - difference connectable to this
specific
problem??? There is no specific change mentioned related to Proc32 between
QNX
4.25B & 4.25C; Proc32 4.25J appeared in the QNX 4.25C patch…)

[The original article has been posted on comp.os.qnx, too…]

Previously, Donald Backstrom wrote in qdn.public.qnx4:
Not sure if it helps, and I don’t know of the specific versions
involved, but
here is the following anecdote:

At the QNX2000 conference last year it was noted by somebody (and
accepted by a
QSSL representative) that there were problems when ticksize, a timer and
the
hardware clock were not all integral multiples of each other. You might
expect
that the timer would expire at the nearest ticksize multiple buit in
fact that
is not the case. Apparently (an I have not checked this myself, and it
may be
subject to which version you try) you get either a series of short
periods,
followed by a missed tick, or a series of long periods followed by 2
ticks.

I’ve posted a short example of subsequent timestamps (related to timer
proxy
triggering in monitor-log) on comp.os.qnx…

Over time, the expected number of ticks arrives, but the timing is not
what you
expect.

I suspect what you have discovered is either a version of the above
problem, or
an attempt at fixing it.

The problem (described in this form listed below) had NEVER occured before
Proc32 4.25J (file date: Sep 9 1999, release date of QNX 4.25C patch
[containing Proc32 4.25J]: Feb 14 2000), so maybe ‘the attemp at fixing an
unknown problem was unsuccessful’ (IMHO):)…


Tamasi Gyorgy wrote:

We experienced a very strange problem seems to be related to
timer-processing under QNX 4.25D (specifically in Proc32 4.25J !!!).

(I’ve already filled a problem report form on the QDN website, but
there was no feedback, so I try to post it here (with more details),
too…)

  • Problem summary:

Under Proc32 4.25J it can frequently happen, that a periodic timer
using proxy notification will expire after (cycle-interval-time +/-
ticksize), and not after - the required - (cycle-interval-time)!

The (+/-ticksize) differences will be balanced by each other, almost
immediately, so the average expiration time will be the required
value, but the rate of occuring +/- differences is quite high: in
some of the tests it had reached the 20%/20% value!

An important fact, that if - after starting the owner application
of the timer - we explicitly set ticksize to a lower value, and then
back to the original one, the +/- differences disappear, the
affected timer will expire accurately with the required
cycle-interval-time…

[Is it a bug or a feature???]

  • Environment:

  • Hardware (does not seem to be relevant):

Various Intel chipset (TX,ZX) based PCs (+ a notbook) with Intel
Pentium MMX & Celeron processors. (I can provide further details
on request…)

  • Software:

  • QNX 4.25D (Proc32: 4.25J)

  • Photon 1.14A

  • Watcom C/C++ 10.6B

  • Problem details:

There is a timer attached (as a periodic wakeup-timer) in a high
priority
(-> 27/28) process, with a 20ms cycle interval, using proxy based
notification. (The ticksize is 2ms.)

[Normally the owner process can complete its periodic activity
within the cycle time of its wakeup timer (20ms).]

The problem is, that it can frequently happen, that the timer will
expire after 18ms or 22ms (cycle-time +/- ticksize) and not after
20ms (the required cycle-time).

The +/- differences will be balanced/corrected by each other, a -(+)
difference generally will be immediately followed by a +(-)
difference, but the rate of abnormal expiration times is quite high.

Testing method: I used ‘monitor’ for testing (priority: 29), and I
compared the timestamps of following lines:

" proxy(()) triggers ()"

[-> where the proxy seems to be triggered…]

and not the lines:

" () recv … from proxy(())"

[-> where the owner of the timer finally seems to receive the msg
of the proxy; it can be delayed…]

Some notes:

  • Running with different cycle times (8 or 10ms) identical results
    will be produced.

  • An other timer with the same cycle-interval-time in an other
    process (running as a part of the same application) shows
    identical symptoms, so we can not say that the problem affects
    only a specific timer of a specific process.

  • Priority of Proc32: 30 (default).

  • The -F option of Proc32 is NOT used, so the frequency of the 8254
    clock is the default.

  • Priority of Fsys.eide was reduced from 22 to 11.

  • Possible fixes:

  • Stepping back to Proc32 4.25I the problem disappears, the affected
    timer expires accurately.

  • Running under Proc32 4.25J, if - after starting all of the
    component processes - we explicitly set ticksize to a lower value
    (e.g. to 1ms) and then back to 2ms, the problem seems to
    disappear! (It’s really strange, but it happens…)

By stopping the whole application and then restarting it, the
problem occurs again, but a repeated ‘ticksize 1; ticksize 2’
command sequence helps again…

[Each of the possible solutions can be verified by the ‘monitor’
utility: no more +/-ticksize differences can be found in the
timestamps of 'proxy … triggers … ’ lines!!!]

[On request I can send detailed test results (->monitor-outputs,
etc.)…]

Good luck in investigation,
Gyorgy Tamasi (> cosy-software@freemail.hu> )
COSY Software / Schoenenberger Systeme GmbH
\

I would concentrate on the following facts:

  • Proc32 4.25J produces the problem with 2ms ticksize, but after executing a
    ‘ticksize 1; ticksize 2’ sequence, the problem disappears! Why?

  • Proc32 4.25I DOES NOT produce the problem. Why?

(‘Why?’ questions would be addressed to an in-house Proc32 expert - at
QSSL… Is there any - source-level - difference connectable to this specific
problem??? There is no specific change mentioned related to Proc32 between QNX
4.25B & 4.25C; Proc32 4.25J appeared in the QNX 4.25C patch…)

[The original article has been posted on comp.os.qnx, too…]

Previously, Donald Backstrom wrote in qdn.public.qnx4:

Not sure if it helps, and I don’t know of the specific versions involved, but
here is the following anecdote:

At the QNX2000 conference last year it was noted by somebody (and accepted by a
QSSL representative) that there were problems when ticksize, a timer and the
hardware clock were not all integral multiples of each other. You might expect
that the timer would expire at the nearest ticksize multiple buit in fact that
is not the case. Apparently (an I have not checked this myself, and it may be
subject to which version you try) you get either a series of short periods,
followed by a missed tick, or a series of long periods followed by 2 ticks.

I’ve posted a short example of subsequent timestamps (related to timer proxy
triggering in monitor-log) on comp.os.qnx…

Over time, the expected number of ticks arrives, but the timing is not what you
expect.

I suspect what you have discovered is either a version of the above problem, or
an attempt at fixing it.

The problem (described in this form listed below) had NEVER occured before
Proc32 4.25J (file date: Sep 9 1999, release date of QNX 4.25C patch
[containing Proc32 4.25J]: Feb 14 2000), so maybe ‘the attemp at fixing an
unknown problem was unsuccessful’ (IMHO):)…

Tamasi Gyorgy wrote:

We experienced a very strange problem seems to be related to
timer-processing under QNX 4.25D (specifically in Proc32 4.25J !!!).

(I’ve already filled a problem report form on the QDN website, but
there was no feedback, so I try to post it here (with more details),
too…)

  • Problem summary:

Under Proc32 4.25J it can frequently happen, that a periodic timer
using proxy notification will expire after (cycle-interval-time +/-
ticksize), and not after - the required - (cycle-interval-time)!

The (+/-ticksize) differences will be balanced by each other, almost
immediately, so the average expiration time will be the required
value, but the rate of occuring +/- differences is quite high: in
some of the tests it had reached the 20%/20% value!

An important fact, that if - after starting the owner application
of the timer - we explicitly set ticksize to a lower value, and then
back to the original one, the +/- differences disappear, the
affected timer will expire accurately with the required
cycle-interval-time…

[Is it a bug or a feature???]

  • Environment:

  • Hardware (does not seem to be relevant):

Various Intel chipset (TX,ZX) based PCs (+ a notbook) with Intel
Pentium MMX & Celeron processors. (I can provide further details
on request…)

  • Software:

  • QNX 4.25D (Proc32: 4.25J)

  • Photon 1.14A

  • Watcom C/C++ 10.6B

  • Problem details:

There is a timer attached (as a periodic wakeup-timer) in a high priority
(-> 27/28) process, with a 20ms cycle interval, using proxy based
notification. (The ticksize is 2ms.)

[Normally the owner process can complete its periodic activity
within the cycle time of its wakeup timer (20ms).]

The problem is, that it can frequently happen, that the timer will
expire after 18ms or 22ms (cycle-time +/- ticksize) and not after
20ms (the required cycle-time).

The +/- differences will be balanced/corrected by each other, a -(+)
difference generally will be immediately followed by a +(-)
difference, but the rate of abnormal expiration times is quite high.

Testing method: I used ‘monitor’ for testing (priority: 29), and I
compared the timestamps of following lines:

" proxy(()) triggers ()"

[-> where the proxy seems to be triggered…]

and not the lines:

" () recv … from proxy(())"

[-> where the owner of the timer finally seems to receive the msg
of the proxy; it can be delayed…]

Some notes:

  • Running with different cycle times (8 or 10ms) identical results
    will be produced.

  • An other timer with the same cycle-interval-time in an other
    process (running as a part of the same application) shows
    identical symptoms, so we can not say that the problem affects
    only a specific timer of a specific process.

  • Priority of Proc32: 30 (default).

  • The -F option of Proc32 is NOT used, so the frequency of the 8254
    clock is the default.

  • Priority of Fsys.eide was reduced from 22 to 11.

  • Possible fixes:

  • Stepping back to Proc32 4.25I the problem disappears, the affected
    timer expires accurately.

  • Running under Proc32 4.25J, if - after starting all of the
    component processes - we explicitly set ticksize to a lower value
    (e.g. to 1ms) and then back to 2ms, the problem seems to
    disappear! (It’s really strange, but it happens…)

By stopping the whole application and then restarting it, the
problem occurs again, but a repeated ‘ticksize 1; ticksize 2’
command sequence helps again…

[Each of the possible solutions can be verified by the ‘monitor’
utility: no more +/-ticksize differences can be found in the
timestamps of 'proxy … triggers … ’ lines!!!]

[On request I can send detailed test results (->monitor-outputs, etc.)…]

Good luck in investigation,
Gyorgy Tamasi (> cosy-software@freemail.hu> )
COSY Software / Schoenenberger Systeme GmbH

“Tamasi Gyorgy” <gtamasi@freemail.hu> wrote in message
news:Voyager.010110013313.8674C@server…

There was an important note I intentionally forgot to mention earlier (to
reduce complexity of the complex enough original posting): maybe the
brain/water ratio in my skull is relatively higher, than my silly posting
can
suggest, and as a consequence: the affected timers - mentioned in my
posting -
are initialized using the exact multiple of the system ticksize (as a
nanosecond value; queried by clock_getres()).

The most important benefit of reading the mentioned articles was, that
I’ve
cut your demo program from the ‘Concept of Time’ article, modified it to
make
it runnable under QNX4, too (you can find it attached to this mail), and
now
we can reproduce the problem I’ve mentioned, in details (without having
the
too-complex-to-deal-with application context mentioned in the original
posting)…

Because now the situation is reproducible (and “really dramatic”
[CNN,Gyorgy
Tamasi,Bagdad]).

  • My version now uses the same ticksize & timer-cycle values by default,
    as
    your version. (I’ve also tested it with the ticksize/timer values
    mentioned
    in the original posting, and my original test-results now can be exactly
    reproduced…)

  • When you run the demo, redirect its stdout to a file, becuase especially
    in
    case of low ticksize values (as the default 1ms) it can produces a lot
    of
    output (with abnormal timestamps), and it seems, that the low speed of a
    terminal as an stdout can have serious impact on the accuracy of
    timestamps.

  • My version will also log the too-short timer cycles (relative to the
    previous timer-tick).

If you compile & run my demo-version under QNX4.25D/Proc32 4.25J, you will
see
the following things, as summarized now:

  • Compiling & running with an 1000000ns (1ms) timer cycle everything
    works, as
    we expect. We got some abnormal expirations, but this is exactly what we
    expect (and is quite different, than the problem I tried to describe
    earlier).

  • Set the timer cycle to 999847ns (the exact OS ticksize), and see the
    wonder
    (after compiling): running the demo it produces tons of abnormal
    timestamps…then after ~5000 ticks my demo-version will set the
    ticksize to
    0.5ms (by calling clock_setres() itself), then - after some ticks - back
    to
    1ms (logging both operations on stdout), and the tons of abnormal
    timestamps
    will disappear!!! (I really do not know, why…)

I can only confirm your observation but unfortunately I have no explanation.
This doesn’t make sense to me. I’d be very tempted to call this a bug,
or a best an undocumented feature :wink: Hopefull QNX will drop in !?!

  • Under Proc32 4.25I my version will also behave, as we expect, and as
    your
    version would behave under Neutrino: everything works according to the
    docs,
    books & advertisements…

  • The last gift (maybe this is what will bring us closer to the real
    solution): setting the timer cycle to exact-ticksize-in-nanosec + 1ns
    (e.g.
    999847 + 1ns, in case of the default 1ms ticksize), the problem
    disappear
    under Proc32 4.25J!!! No more abnormal timestamps will be
    produced…[‘And
    this is an evidence, not only a promise’, you may also be a guy, who
    requires it…]

If you can reproduce my results (as I hope), and find these results
interisting/exiting/nerveracking, please, direct the attention of a
responsible expert at QSSL to these results (or advise me one to
contact)…

Good luck & thanks for the response, any further feedback are welcome,
Gyorgy

PS (ok, overdramatized): Some hours ago I felt as a novice (but
accidentally
successful) beta tester could (however I’m not a novice beta tester, nor a
novice, nor a beta tester) in his very special role, when he finally gets
in closest contact with a shark, deeply in the private swimming pool of
the
Company. Experienced experts say: ‘swim, swim; just keep swimming, as you
can
read in the docs (here are some pointers); you could reorganize your
movement’
(he already lost one of his legs and/or arms)…Yes, normally we can not
find
sharks in the private swimming pool of the Company, and suggesting, that
you
have found one, is the best way to win an exclusive ticket to the Yellow
House
(a Hungarian term: ->asylum), but sometimes sharks arrives, and this time
seems to be now…Ladies & Gentlemen: take care of arms & legs…

Previously, Mario Charest wrote in qdn.public.qnx4:

Hum seems my post didn’t make it (unless I’m confuse by cross posting)
Please read the following two articles:

http://qdn.qnx.com/articles/oct2300/quantization.html
http://qdn.qnx.com/articles/oct3100/concept_of_time.html

They are for NTO but apply to QNX4.


(…rest of the discussion clipped…)

By the way in your code:

// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring another local variable
struct _osinfo *pOSInfo;
volatile struct _timesel __far *pOSTime;

pOSInfo = (struct _osinfo *) alloca( sizeof( *pOSInfo ) );
assert( pOSInfo != NULL );

Just in case this is confusing to other, alloc is no different then
declaring a
local variable. Alloca allocate space on the stack. The only difference
between alloca and using a local variable is you get to check if there
is enough space on the stack with alloca, while with a local variable
your program may crash or exit with an stack overflow error.
But that doesn’t change the fact that the stack must be big enough
in both cases

“Tamasi Gyorgy” <gtamasi@freemail.hu> wrote in message
news:Voyager.010110013313.8674C@server…

There was an important note I intentionally forgot to mention earlier (to
reduce complexity of the complex enough original posting): maybe the
brain/water ratio in my skull is relatively higher, than my silly posting
can
suggest, and as a consequence: the affected timers - mentioned in my
posting -
are initialized using the exact multiple of the system ticksize (as a
nanosecond value; queried by clock_getres()).

The most important benefit of reading the mentioned articles was, that
I’ve
cut your demo program from the ‘Concept of Time’ article, modified it to
make
it runnable under QNX4, too (you can find it attached to this mail), and
now
we can reproduce the problem I’ve mentioned, in details (without having
the
too-complex-to-deal-with application context mentioned in the original
posting)…

Because now the situation is reproducible (and “really dramatic”
[CNN,Gyorgy
Tamasi,Bagdad]).

  • My version now uses the same ticksize & timer-cycle values by default,
    as
    your version. (I’ve also tested it with the ticksize/timer values
    mentioned
    in the original posting, and my original test-results now can be exactly
    reproduced…)

  • When you run the demo, redirect its stdout to a file, becuase especially
    in
    case of low ticksize values (as the default 1ms) it can produces a lot
    of
    output (with abnormal timestamps), and it seems, that the low speed of a
    terminal as an stdout can have serious impact on the accuracy of
    timestamps.

  • My version will also log the too-short timer cycles (relative to the
    previous timer-tick).

If you compile & run my demo-version under QNX4.25D/Proc32 4.25J, you will
see
the following things, as summarized now:

  • Compiling & running with an 1000000ns (1ms) timer cycle everything
    works, as
    we expect. We got some abnormal expirations, but this is exactly what we
    expect (and is quite different, than the problem I tried to describe
    earlier).

  • Set the timer cycle to 999847ns (the exact OS ticksize), and see the
    wonder
    (after compiling): running the demo it produces tons of abnormal
    timestamps…then after ~5000 ticks my demo-version will set the
    ticksize to
    0.5ms (by calling clock_setres() itself), then - after some ticks - back
    to
    1ms (logging both operations on stdout), and the tons of abnormal
    timestamps
    will disappear!!! (I really do not know, why…)

  • Under Proc32 4.25I my version will also behave, as we expect, and as
    your
    version would behave under Neutrino: everything works according to the
    docs,
    books & advertisements…

  • The last gift (maybe this is what will bring us closer to the real
    solution): setting the timer cycle to exact-ticksize-in-nanosec + 1ns
    (e.g.
    999847 + 1ns, in case of the default 1ms ticksize), the problem
    disappear
    under Proc32 4.25J!!! No more abnormal timestamps will be
    produced…[‘And
    this is an evidence, not only a promise’, you may also be a guy, who
    requires it…]

If you can reproduce my results (as I hope), and find these results
interisting/exiting/nerveracking, please, direct the attention of a
responsible expert at QSSL to these results (or advise me one to
contact)…

Good luck & thanks for the response, any further feedback are welcome,
Gyorgy

PS (ok, overdramatized): Some hours ago I felt as a novice (but
accidentally
successful) beta tester could (however I’m not a novice beta tester, nor a
novice, nor a beta tester) in his very special role, when he finally gets
in closest contact with a shark, deeply in the private swimming pool of
the
Company. Experienced experts say: ‘swim, swim; just keep swimming, as you
can
read in the docs (here are some pointers); you could reorganize your
movement’
(he already lost one of his legs and/or arms)…Yes, normally we can not
find
sharks in the private swimming pool of the Company, and suggesting, that
you
have found one, is the best way to win an exclusive ticket to the Yellow
House
(a Hungarian term: ->asylum), but sometimes sharks arrives, and this time
seems to be now…Ladies & Gentlemen: take care of arms & legs…

Previously, Mario Charest wrote in qdn.public.qnx4:

Hum seems my post didn’t make it (unless I’m confuse by cross posting)
Please read the following two articles:

http://qdn.qnx.com/articles/oct2300/quantization.html
http://qdn.qnx.com/articles/oct3100/concept_of_time.html

They are for NTO but apply to QNX4.


(…rest of the discussion clipped…)

Previously, Mario Charest wrote in qdn.public.qnx4:

By the way in your code:

// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring another local variable
struct _osinfo *pOSInfo;
volatile struct _timesel __far *pOSTime;

pOSInfo = (struct _osinfo *) alloca( sizeof( *pOSInfo ) );
assert( pOSInfo != NULL );

Just in case this is confusing to other, alloc is no different then
declaring a
local variable. Alloca allocate space on the stack. The only difference
between alloca and using a local variable is you get to check if there
is enough space on the stack with alloca, while with a local variable
your program may crash or exit with an stack overflow error.
But that doesn’t change the fact that the stack must be big enough
in both cases

Bingo. (Big Brother is watching me…) It’s really a ‘mistyping’…

Finally (under heavy torture) I have to confess, that I’ve just cut
the affected code fragment from a function listed below:


int clock_GetLocalRTTime (struct timespec *apTime)
{
#if USE_QNX_SHARED_TIMESEL

long sec, nsec;
static int _inited = 0;
static volatile struct _timesel __far *_pOSTime;

if (!_inited)
{
// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring a local variable
struct _osinfo *pOSInfo;

if ((pOSInfo = alloca(sizeof(*pOSInfo))) == NULL)
return -1;
if (qnx_osinfo(0, pOSInfo) != 0)
return -1;
_pOSTime = (struct _timesel __far *)MK_FP(pOSInfo->timesel, 0);
_inited = 1;
}

// get current <sec,nsec> value (loop guards against interrupts)
do
{
sec = _pOSTime->seconds;
nsec = _pOSTime->nsec;
} while (sec != _pOSTime->seconds || nsec != _pOSTime->nsec);

apTime->tv_sec = sec;
apTime->tv_nsec = nsec;

return 0;

#else // if USE_QNX_SHARED_TIMESEL

return clock_gettime(CLOCK_REALTIME, apTime);

#endif // if !USE_QNX_SHARED_TIMESEL
}

I’m sure, now you can feel the difference.
(…ecno ylno dellac eb lliw ()acolla)

However, I wouldn’t go into the direction of reposting this thread as ‘The
history of the On-a-useless-application-of-alloca() flame war’ (…even if it
would surely lift our thread to the top of the thread-list again)…

So back to the original point: could you name a Proc32-expert at QSSL to
contact directly regarding my (our) timer problem? Or you’ve already
contacted?

Regards,
Gyorgy

(snip)

“Tamasi Gyorgy” <gtamasi@freemail.hu> wrote in message
news:Voyager.010111001556.15576B@server…

Previously, Mario Charest wrote in qdn.public.qnx4:

By the way in your code:

// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring another local variable
struct _osinfo *pOSInfo;
volatile struct _timesel __far *pOSTime;

pOSInfo = (struct _osinfo *) alloca( sizeof( *pOSInfo ) );
assert( pOSInfo != NULL );

Just in case this is confusing to other, alloc is no different then
declaring a
local variable. Alloca allocate space on the stack. The only
difference
between alloca and using a local variable is you get to check if there
is enough space on the stack with alloca, while with a local variable
your program may crash or exit with an stack overflow error.
But that doesn’t change the fact that the stack must be big enough
in both cases

Bingo. (Big Brother is watching me…) It’s really a ‘mistyping’…

Finally (under heavy torture) I have to confess, that I’ve just cut
the affected code fragment from a function listed below:



int clock_GetLocalRTTime (struct timespec *apTime)
{
#if USE_QNX_SHARED_TIMESEL

long sec, nsec;
static int _inited = 0;
static volatile struct _timesel __far *_pOSTime;

if (!_inited)
{
// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring a local variable
struct _osinfo *pOSInfo;

if ((pOSInfo = alloca(sizeof(*pOSInfo))) == NULL)
return -1;
if (qnx_osinfo(0, pOSInfo) != 0)
return -1;
_pOSTime = (struct _timesel __far *)MK_FP(pOSInfo->timesel, 0);
_inited = 1;
}

// get current <sec,nsec> value (loop guards against interrupts)
do
{
sec = _pOSTime->seconds;
nsec = _pOSTime->nsec;
} while (sec != _pOSTime->seconds || nsec != _pOSTime->nsec);

apTime->tv_sec = sec;
apTime->tv_nsec = nsec;

return 0;

#else // if USE_QNX_SHARED_TIMESEL

return clock_gettime(CLOCK_REALTIME, apTime);

#endif // if !USE_QNX_SHARED_TIMESEL
}


I’m sure, now you can feel the difference.
(…ecno ylno dellac eb lliw ()acolla)

However, I wouldn’t go into the direction of reposting this thread as ‘The
history of the On-a-useless-application-of-alloca() flame war’ (…even if
it
would surely lift our thread to the top of the thread-list again)…

So back to the original point: could you name a Proc32-expert at QSSL to
contact directly regarding my (our) timer problem?

In general QSSL staff won’t reply to direct email.
If they haven’t respond here it’s because QNX4 kernel
related people probably haven’t read it or for
some reason they choose to ignore it.

I’ve done all I can, sorry if it’s not enough.

Or you’ve already contacted?


Regards,
Gyorgy

(snip)

“Mario Charest” <mcharest@void_zinformatic.com> wrote in message
news:93j62m$lfj$1@nntp.qnx.com

“Tamasi Gyorgy” <> gtamasi@freemail.hu> > wrote in message
news:Voyager.010111001556.15576B@server…
Previously, Mario Charest wrote in qdn.public.qnx4:

By the way in your code:

[cut]> So back to the original point: could you name a Proc32-expert at QSSL

to

contact directly regarding my (our) timer problem?

In general QSSL staff won’t reply to direct email.
If they haven’t respond here it’s because QNX4 kernel
related people probably haven’t read it or for
some reason they choose to ignore it.

I’ve done all I can, sorry if it’s not enough.

UPDATE: Somebody at QSSL is looking into this.

Or you’ve already contacted?


Regards,
Gyorgy

(snip)
\

i’ll try to offer some thoughts (as a qnx person :slight_smile:


there are a couple of things to watch for.
first, the rules (as you already know them)

the microkernel schedules all processes
based on the following order from highest to lowest:

  • highest priority interrupt routine runs to completion
    (based on -i argument to Proc32, default=irq3)
  • other interrupts finish based on their priority
    (interrupts are fully nested and prioritized)
  • the highest priority process that is READY to run will run

a repetitive timer is fired from the kernel itself. it is important to
differentiate between the lower level kernel component, which is interrupt
driven (software and hardware) and the higher level Proc pieces that handle
messages.
e.g. a timer_create is a message pass to Proc who sets up the timer table, then
the kernel itself will trigger the proxy when needed.

So, to do realtime several things are needed.

  1. if you can’t afford to be preempted by Proc or any other process, then you
    need to put some of your critical code in an interrupt handler. then
    make sure that your interrupt is highest priority (-i)
  2. if you don’t mind getting preempted by other int handlers then you can run
    your routines at process time… if you do this it is usually recommended
    to run your critical process at a priority above Proc. drop Proc in your
    build image down to 26 (with -P option), then run your app at 27 or 28.


    timers in the kernel are at the granularity of the 8254 and reflected with
    ticksize -e. but you don’t want to be on the boundary condition of the
    tick.

if you do polling at high resolutions i recommend dropping the ticksize to
a 1/4 of your desired poll period if you can.

the kernel timer code does the following on an irq 0:

fabricate 50ms timers (-1)
do time accounting

walk timer_table and check for: (curtime = current time, timer = your timer)
if valid timer
if (( timer.seconds < curtime.seconds ) ||
( timer.seconds == curtime.seconds &&
timer.nsec < curtime.nsec )
trigger proxy

so you really want to avoid boundary conditions where you are bang on the
nsec value. i usually set my timers to be slightly less than the exact
value from ‘ticksize -e’.

but remember that the kernel timer resolution is only the ticksize. so if
you miss the above loop then your timer will only fire on the next irq0.

and also… be aware that SMM on a pentium is a nasty thing. check the web
on SMM and ask your bios/hardware vendor about disabling this… the side
effect of having system management mode enabled is that the bios firmware
upon receipt of an SMI will take over the CPU for milliseconds at a time.
most newer pentiums use SMM for things like polling usb/pcmcia controllers
and doing power management etc.

now why would different Proc’s behave differently?

and why would changing ticksize modify the results?

when you create a timer, the kernel makes a sec/nsec structure that reflects
your timer in absolute terms based on current time. in the math for doing
this i think you are seeing some rounding issues… so that sometimes your
timer will be below the nsec of the curtime, whereas other times you were
above and you wouldn’t get triggered till later.

bottom line is, don’t sit on the boundary condition.

i hope this helps!!


Mario Charest <mcharest@void_zinformatic.com> wrote:

“Tamasi Gyorgy” <> gtamasi@freemail.hu> > wrote in message
news:Voyager.010111001556.15576B@server…
Previously, Mario Charest wrote in qdn.public.qnx4:

By the way in your code:

// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring another local variable
struct _osinfo *pOSInfo;
volatile struct _timesel __far *pOSTime;

pOSInfo = (struct _osinfo *) alloca( sizeof( *pOSInfo ) );
assert( pOSInfo != NULL );

Just in case this is confusing to other, alloc is no different then
declaring a
local variable. Alloca allocate space on the stack. The only
difference
between alloca and using a local variable is you get to check if there
is enough space on the stack with alloca, while with a local variable
your program may crash or exit with an stack overflow error.
But that doesn’t change the fact that the stack must be big enough
in both cases

Bingo. (Big Brother is watching me…) It’s really a ‘mistyping’…

Finally (under heavy torture) I have to confess, that I’ve just cut
the affected code fragment from a function listed below:



int clock_GetLocalRTTime (struct timespec *apTime)
{
#if USE_QNX_SHARED_TIMESEL

long sec, nsec;
static int _inited = 0;
static volatile struct _timesel __far *_pOSTime;

if (!_inited)
{
// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring a local variable
struct _osinfo *pOSInfo;

if ((pOSInfo = alloca(sizeof(*pOSInfo))) == NULL)
return -1;
if (qnx_osinfo(0, pOSInfo) != 0)
return -1;
_pOSTime = (struct _timesel __far *)MK_FP(pOSInfo->timesel, 0);
_inited = 1;
}

// get current <sec,nsec> value (loop guards against interrupts)
do
{
sec = _pOSTime->seconds;
nsec = _pOSTime->nsec;
} while (sec != _pOSTime->seconds || nsec != _pOSTime->nsec);

apTime->tv_sec = sec;
apTime->tv_nsec = nsec;

return 0;

#else // if USE_QNX_SHARED_TIMESEL

return clock_gettime(CLOCK_REALTIME, apTime);

#endif // if !USE_QNX_SHARED_TIMESEL
}


I’m sure, now you can feel the difference.
(…ecno ylno dellac eb lliw ()acolla)

However, I wouldn’t go into the direction of reposting this thread as ‘The
history of the On-a-useless-application-of-alloca() flame war’ (…even if
it
would surely lift our thread to the top of the thread-list again)…

So back to the original point: could you name a Proc32-expert at QSSL to
contact directly regarding my (our) timer problem?

In general QSSL staff won’t reply to direct email.
If they haven’t respond here it’s because QNX4 kernel
related people probably haven’t read it or for
some reason they choose to ignore it.

I’ve done all I can, sorry if it’s not enough.

Or you’ve already contacted?


Regards,
Gyorgy

(snip)


Randy Martin randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579

Previously, Mario Charest wrote in qdn.public.qnx4:

“Tamasi Gyorgy” <> gtamasi@freemail.hu> > wrote in message
news:Voyager.010111001556.15576B@server…

[snip]

So back to the original point: could you name a Proc32-expert at QSSL to
contact directly regarding my (our) timer problem?

In general QSSL staff won’t reply to direct email.
If they haven’t respond here it’s because QNX4 kernel
related people probably haven’t read it or for
some reason they choose to ignore it.

(Hm…but sometimes(!) we - you & me & others - got replies from
QSSL staff to direct emails…)

I would like to make it clear: I’m not the guy, who would like to
win the “Bug of the Year” race in the QNX section, nor who would
regularly bomb QSSL with direct emails having subject of “Is it a
kernel bug, or I’ve just mistyped somthing?!?”. And I’m not
offensive, I rather would like to be helpful. My only (humble)
goal is to get a (whatever short) confirmation from a proper QSSL
expert, that “yes, your problem seems to be realistic and
theoretically, it can be in a close connection with a so called
‘undocumented feature’ in Proc32 4.25J, specifically”, so we will
have the option, that in a possible future release of QNX4/Proc32
(if will be released at all?) this ‘undocumented feature’ will be
either ‘documented’ or removed. (Or is it a not-humble-enough
goal?)

That’s all.

I’ve done all I can, sorry if it’s not enough.

Or you’ve already contacted?

If you would like to suggest, that I can treat your confirmation,
as if it would be a confirmation of ‘that proper QSSL expert’
(because you will give or already have given a pointer to a ‘QNX4
kernel related people’ to just read this thread), I can rest in
peace…

Previously, Mario Charest wrote in qdn.public.qnx4:

“Mario Charest” <mcharest@void_zinformatic.com> wrote in message
news:93j62m$lfj$> 1@nntp.qnx.com> …

“Tamasi Gyorgy” <> gtamasi@freemail.hu> > wrote in message
news:Voyager.010111001556.15576B@server…
Previously, Mario Charest wrote in qdn.public.qnx4:

By the way in your code:

[cut]> So back to the original point: could you name a Proc32-expert at QSSL
to
contact directly regarding my (our) timer problem?

In general QSSL staff won’t reply to direct email.
If they haven’t respond here it’s because QNX4 kernel
related people probably haven’t read it or for
some reason they choose to ignore it.

I’ve done all I can, sorry if it’s not enough.


UPDATE: Somebody at QSSL is looking into this.

THANKS.

(Sorry about the out-of-synchron replies…)

Or you’ve already contacted?


Regards,
Gyorgy

(snip)
\

“Randy Martin” <randy@qnx.com> wrote in message
news:93ko09$jkc$1@nntp.qnx.com

i’ll try to offer some thoughts (as a qnx person > :slight_smile:

[cut]

now why would different Proc’s behave differently?

and why would changing ticksize modify the results?

when you create a timer, the kernel makes a sec/nsec structure that
reflects
your timer in absolute terms based on current time. in the math for doing
this i think you are seeing some rounding issues… so that sometimes your
timer will be below the nsec of the curtime, whereas other times you were
above and you wouldn’t get triggered till later.

bottom line is, don’t sit on the boundary condition.

i hope this helps!!

While I agree with everything you said Randy. I still don’t understand why

setting the TICKSIZE 1 then .5 then 1 ms would change how a timer
behave when the ticksize is set back to 1ms. This is HIGHLY repeatable
and not due to SMM or any other time related side effect. IMHO when
the timer is set back to 1ms , the timer should behave the same.

Mario Charest <mcharest@void_zinformatic.com> wrote:

“Tamasi Gyorgy” <> gtamasi@freemail.hu> > wrote in message
news:Voyager.010111001556.15576B@server…
Previously, Mario Charest wrote in qdn.public.qnx4:

By the way in your code:

// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring another local variable
struct _osinfo *pOSInfo;
volatile struct _timesel __far *pOSTime;

pOSInfo = (struct _osinfo *) alloca( sizeof( *pOSInfo ) );
assert( pOSInfo != NULL );

Just in case this is confusing to other, alloc is no different then
declaring a
local variable. Alloca allocate space on the stack. The only
difference
between alloca and using a local variable is you get to check if
there
is enough space on the stack with alloca, while with a local variable
your program may crash or exit with an stack overflow error.
But that doesn’t change the fact that the stack must be big enough
in both cases

Bingo. (Big Brother is watching me…) It’s really a ‘mistyping’…

Finally (under heavy torture) I have to confess, that I’ve just cut
the affected code fragment from a function listed below:

\


int clock_GetLocalRTTime (struct timespec *apTime)
{
#if USE_QNX_SHARED_TIMESEL

long sec, nsec;
static int _inited = 0;
static volatile struct _timesel __far *_pOSTime;

if (!_inited)
{
// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring a local variable
struct _osinfo *pOSInfo;

if ((pOSInfo = alloca(sizeof(*pOSInfo))) == NULL)
return -1;
if (qnx_osinfo(0, pOSInfo) != 0)
return -1;
_pOSTime = (struct _timesel __far *)MK_FP(pOSInfo->timesel, 0);
_inited = 1;
}

// get current <sec,nsec> value (loop guards against interrupts)
do
{
sec = _pOSTime->seconds;
nsec = _pOSTime->nsec;
} while (sec != _pOSTime->seconds || nsec != _pOSTime->nsec);

apTime->tv_sec = sec;
apTime->tv_nsec = nsec;

return 0;

#else // if USE_QNX_SHARED_TIMESEL

return clock_gettime(CLOCK_REALTIME, apTime);

#endif // if !USE_QNX_SHARED_TIMESEL
}



I’m sure, now you can feel the difference.
(…ecno ylno dellac eb lliw ()acolla)

However, I wouldn’t go into the direction of reposting this thread as
‘The
history of the On-a-useless-application-of-alloca() flame war’ (…even
if
it
would surely lift our thread to the top of the thread-list again)…

So back to the original point: could you name a Proc32-expert at QSSL
to
contact directly regarding my (our) timer problem?

In general QSSL staff won’t reply to direct email.
If they haven’t respond here it’s because QNX4 kernel
related people probably haven’t read it or for
some reason they choose to ignore it.

I’ve done all I can, sorry if it’s not enough.

Or you’ve already contacted?


Regards,
Gyorgy

(snip)




\

Randy Martin > randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems > www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579

Mario Charest <mcharest@void_zinformatic.com> wrote:

While I agree with everything you said Randy. I still don’t understand why
setting the TICKSIZE 1 then .5 then 1 ms would change how a timer
behave when the ticksize is set back to 1ms. This is HIGHLY repeatable
and not due to SMM or any other time related side effect. IMHO when
the timer is set back to 1ms , the timer should behave the same.

timers are calculated when they are created and given absolute times of
expiry, not a relative time. the kernel calculates this absolute time
for you, so if it calculates it and gets it bang on the real time rather than
slightly less (see previous posted pseudo-code) then you could be in
trouble.
and the kernel recalculates the new time when it rearms a repetivie timer.
so if there is any math rounding issues with how it does this (and it uses
current ticksize to do this) then you could be still be right on or just
after the curtime when it next fires.

bottom line is… don’t sit right at the boundary condition. sit underneath
it by a bit so that you will always fire when you expect.

or if you need to, do some work in an int handler attached to irq0. but keep
it short and simple.

Mario Charest <mcharest@void_zinformatic.com> wrote:

“Tamasi Gyorgy” <> gtamasi@freemail.hu> > wrote in message
news:Voyager.010111001556.15576B@server…
Previously, Mario Charest wrote in qdn.public.qnx4:

By the way in your code:

// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring another local variable
struct _osinfo *pOSInfo;
volatile struct _timesel __far *pOSTime;

pOSInfo = (struct _osinfo *) alloca( sizeof( *pOSInfo ) );
assert( pOSInfo != NULL );

Just in case this is confusing to other, alloc is no different then
declaring a
local variable. Alloca allocate space on the stack. The only
difference
between alloca and using a local variable is you get to check if
there
is enough space on the stack with alloca, while with a local variable
your program may crash or exit with an stack overflow error.
But that doesn’t change the fact that the stack must be big enough
in both cases

Bingo. (Big Brother is watching me…) It’s really a ‘mistyping’…

Finally (under heavy torture) I have to confess, that I’ve just cut
the affected code fragment from a function listed below:

\


int clock_GetLocalRTTime (struct timespec *apTime)
{
#if USE_QNX_SHARED_TIMESEL

long sec, nsec;
static int _inited = 0;
static volatile struct _timesel __far *_pOSTime;

if (!_inited)
{
// _osinfo is a quite large structure, so we will use alloca()
// instead of declaring a local variable
struct _osinfo *pOSInfo;

if ((pOSInfo = alloca(sizeof(*pOSInfo))) == NULL)
return -1;
if (qnx_osinfo(0, pOSInfo) != 0)
return -1;
_pOSTime = (struct _timesel __far *)MK_FP(pOSInfo->timesel, 0);
_inited = 1;
}

// get current <sec,nsec> value (loop guards against interrupts)
do
{
sec = _pOSTime->seconds;
nsec = _pOSTime->nsec;
} while (sec != _pOSTime->seconds || nsec != _pOSTime->nsec);

apTime->tv_sec = sec;
apTime->tv_nsec = nsec;

return 0;

#else // if USE_QNX_SHARED_TIMESEL

return clock_gettime(CLOCK_REALTIME, apTime);

#endif // if !USE_QNX_SHARED_TIMESEL
}



I’m sure, now you can feel the difference.
(…ecno ylno dellac eb lliw ()acolla)

However, I wouldn’t go into the direction of reposting this thread as
‘The
history of the On-a-useless-application-of-alloca() flame war’ (…even
if
it
would surely lift our thread to the top of the thread-list again)…

So back to the original point: could you name a Proc32-expert at QSSL
to
contact directly regarding my (our) timer problem?

In general QSSL staff won’t reply to direct email.
If they haven’t respond here it’s because QNX4 kernel
related people probably haven’t read it or for
some reason they choose to ignore it.

I’ve done all I can, sorry if it’s not enough.

Or you’ve already contacted?


Regards,
Gyorgy

(snip)




\

Randy Martin > randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems > www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579


Randy Martin randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579

Previously, Randy Martin wrote in qdn.public.qnx4:

i’ll try to offer some thoughts (as a qnx person > :slight_smile:

Welcome.

there are a couple of things to watch for.
first, the rules (as you already know them)

the microkernel schedules all processes
based on the following order from highest to lowest:

  • highest priority interrupt routine runs to completion
    (based on -i argument to Proc32, default=irq3)
  • other interrupts finish based on their priority
    (interrupts are fully nested and prioritized)
  • the highest priority process that is READY to run will run

a repetitive timer is fired from the kernel itself. it is important to
differentiate between the lower level kernel component, which is interrupt
driven (software and hardware) and the higher level Proc pieces that handle
messages.
e.g. a timer_create is a message pass to Proc who sets up the timer table, then
the kernel itself will trigger the proxy when needed.


Is the line in a monitor-log

" proxy(()) triggers ()"

identifies this event, if proxy is connected to the affected timer?

So, to do realtime several things are needed.

  1. if you can’t afford to be preempted by Proc or any other process, then you
    need to put some of your critical code in an interrupt handler. then
    make sure that your interrupt is highest priority (-i)
  2. if you don’t mind getting preempted by other int handlers then you can run
    your routines at process time… if you do this it is usually recommended
    to run your critical process at a priority above Proc. drop Proc in your
    build image down to 26 (with -P option), then run your app at 27 or 28.


    timers in the kernel are at the granularity of the 8254 and reflected with
    ticksize -e. but you don’t want to be on the boundary condition of the
    tick.

if you do polling at high resolutions i recommend dropping the ticksize to
a 1/4 of your desired poll period if you can.

On ‘poll period’ you mean the timer-cycle interval here?

If yes, do not forget, please, that in my original posting i
described the problem with the following time constraints:

  • ticksize: 2ms (1999695ns)
  • timer-cycle interval (‘poll-period’?): (4, 8, or 10) *
    (ticksize -e) (7998780ns, etc.)

If you modify the demo to use this timing, you will still see
the difference between Proc32 4.25I & 4.25J…

the kernel timer code does the following on an irq 0:

fabricate 50ms timers (-1)
do time accounting

walk timer_table and check for: (curtime = current time, timer = your timer)
if valid timer
if (( timer.seconds < curtime.seconds ) ||
( timer.seconds == curtime.seconds &&
timer.nsec < curtime.nsec )
trigger proxy

so you really want to avoid boundary conditions where you are bang on the
nsec value. i usually set my timers to be slightly less than the exact
value from ‘ticksize -e’.

If you increase or decrease N*(ticksize -e) cycle time with
1ns, everything works correctly even under 4.25J, but if the
cycle is the exact multiple of ticksize -e, 4.25J (just like
me:-) is going to be getting crazy… (‘Just test it…’)

but remember that the kernel timer resolution is only the ticksize. so if
you miss the above loop then your timer will only fire on the next irq0.

and also… be aware that SMM on a pentium is a nasty thing. check the web
on SMM and ask your bios/hardware vendor about disabling this… the side
effect of having system management mode enabled is that the bios firmware
upon receipt of an SMI will take over the CPU for milliseconds at a time.
most newer pentiums use SMM for things like polling usb/pcmcia controllers
and doing power management etc.

(I’m just starting to paint my large demo-table: SMM IS EVIL!.. :slight_smile:

(However I’ve no pcmcia controller and my usb & apm is disabled…)

now why would different Proc’s behave differently?

and why would changing ticksize modify the results?

when you create a timer, the kernel makes a sec/nsec structure that reflects
your timer in absolute terms based on current time. in the math for doing
this i think you are seeing some rounding issues… so that sometimes your
timer will be below the nsec of the curtime, whereas other times you were
above and you wouldn’t get triggered till later.

bottom line is, don’t sit on the boundary condition.

i hope this helps!!

I’m proud, that i can agree with all of your thoughts, in
general. (The stress is on the term ‘in general’ here.) The
contradiction is, that currently the problem is very detailed &
specific, and seems to resist - your & our - general thoughts,
sorry.

[PARENTAL ALERT: EXPLICIT MATERIAL FOLLOWS >:-)]

  • Did you run the demo under Proc32 4.25J? If yes, what was the
    result: did you see periodical abnormal expirations?
  • Did you run the demo under Proc32 4.25I? If yes, what was the
    result: did you see periodical abnormal expirations?
  • You can change the base ticksize & timer-cycle values, as you
    want (keeping them consistent, especially with the hardcoded
    value of ‘ticksize -e’, even if you use a multiple of it, as a
    cycle-time): do you still see differences between behaviour
    under Proc32 4.25I & 4.25J (because i constantly see)?

If you also see, that 4.25I behaves differently (and better?!),
than 4.25J, and you still state, that this is a feature of 4.25J,
and simply caused by a ‘rounding issue’, then i would like to
state, that i do not really like this feature, and i would still
prefer behaviour of 4.25I, until you can show me a situation,
where 4.25I produces abnormal expirations, while 4.25J not (or in
other words: ‘rounding issues’ of 4.25I ‘imply more serious
consequences’, than ‘rounding issues’ of 4.25J; not like now)…
Until now i had no success in founding a situation like this…

Sorry (in advance), but i would raise the level of ‘explicity’
even further (because it seems to be obvious now): is it possible
to run ‘diff’ on the source trees of Proc32 4.25I & 4.25J? If i
were you, i would try it, because i’ve no better answer, than you
should see the reason of this ‘strange symptom(*)’ (or a feature,
on the other hand) in that list (whatever weird is it), since all
of the tests show, that 4.25I behaves differently in a very
specific situation, than 4.25J…

Some historical background: i know about this problem since
~2000-august, when we had installed the 4.25 C & D patch (with
Proc32 4.25J) on 2 boxes in our office (not earlier, not later).
After some testing i’ve (intuitively?) found some solutions
(stepping back to Proc32 4.25I and the ticksize-setting trick).
Believe me, please: the problem is consistently reproducable, at
least in my specific, but not special environment. And now having
the simple demo, you & your colleagues can also test it, whether
is it really reproducable? We have a concrete problem in a
concrete environment, there should be concrete answers…

Good luck (and once more: please, forgive me, that i was so
‘explicit’ now… :slight_smile:
Gyorgy

() Hm…yet another (pc) synonym for the evil term, 'bg’?

[snip]


Randy Martin > randy@qnx.com
Manager of FAE Group, North America
QNX Software Systems > www.qnx.com
175 Terence Matthews Crescent, Kanata, Ontario, Canada K2M 1W8
Tel: 613-591-0931 Fax: 613-591-3579

Previously, Randy Martin wrote in qdn.public.qnx4:

Mario Charest <mcharest@void_zinformatic.com> wrote:

While I agree with everything you said Randy. I still don’t understand why
setting the TICKSIZE 1 then .5 then 1 ms would change how a timer
behave when the ticksize is set back to 1ms. This is HIGHLY repeatable
and not due to SMM or any other time related side effect. IMHO when
the timer is set back to 1ms , the timer should behave the same.

timers are calculated when they are created and given absolute times of
expiry, not a relative time. the kernel calculates this absolute time
for you, so if it calculates it and gets it bang on the real time rather than
slightly less (see previous posted pseudo-code) then you could be in
trouble.
and the kernel recalculates the new time when it rearms a repetivie timer.
so if there is any math rounding issues with how it does this (and it uses
current ticksize to do this) then you could be still be right on or just
after the curtime when it next fires.

bottom line is… don’t sit right at the boundary condition. sit underneath
it by a bit so that you will always fire when you expect.

(:slight_smile: It can be appliead in even other situations of life, a good
example can be: answering customer problems…[ok, it was a tired
joke, just ignore it]…:slight_smile:

or if you need to, do some work in an int handler attached to irq0. but keep
it short and simple.

Oh, i was out of synchron again (another application of ‘bottom
line’?), but no problem, it seems, that you can just apply my
previous posting…

(for all of us:) Have a better tomorrow,
Gyorgy

[snip]

I just can’t rest in peace…

Let me list (mostly repeat) some empirical facts (without providing any
theoretical support/background) (assuming that I want to describe our
current situation to an alien from the planet Mars, who can follow
formal logic, and knows the relation between products QNX4, Neutrino,
QNX RTP and knows release history of QNX 4.25 patches…(this doesn’t
seem to be a strict assumption,
does it? sorry, but the style can seem to be highly adapted to the
alien…)

Test case: we have a periodical timer, with a cycle interval exactly
matching with the value of ticksize -e (if the cycle interval is
ticksize +/- 1ns, we do not have any ‘problem’).

Let’s say that ticksize is 19999965ns, cycle interval of the timer is
4*ticksize.

  • QNX4/Proc32 4.25J <<<

This Proc32 seems to generate timer expirations at an unexpected rate
seeming to arrive at an unexpected time.

As an example, here is a group of SUBSEQUENT(!!!) expirations [in
nanoseconds, (calculated based on RDTSC-cycles of monitor-log); relative
times between entries; expected cycle: 8ms; abnormal entries with +/-
marks]:

8002732
6010508 -
9975614 +
8011035
7988795
6002857 -
7988923
10015715 +
7988811
8012627
7985902
5996243 -
10009333 +
7996662
8006173
8004937
5983963 -
9999990 +
7993121
8004821
5992210 -
7994293
10007836 +
7995369

If you say, that this is caused by a “rounding issue”, I have to say,
that
yes, it is possible, but that rounding algorithm must be erranous in
this special test case, because “rounding problems” comes too frequently
(in this
short sample, containing SUBSEQUENT(!) timer ticks) to accept the
applied rounding algorithm, as a best ever possible…

  • QNX4/Proc32 4.25I <<<

The symptom simply DOES NOT EXIST.

  • Neutrino <<<

I don’t exactly know, what version is used by Mario, but I’m almost
sure, that he already tested this symptom on the newest pre-release,
pre-beta, etc. version of QNX RTP, too.

He published an article with a demo program, which is intended to show a
specific common pitfall of timer usage (namely exactly the side effects
of
normal rounding issues).

Following an indirect way of thinking: we can be sure, that he tested
his program, and his surprise at running my modified version of his
program under QNX4/Proc32 4.25J could show, that he had not detected
similar symptoms under Neutrino earlier, when he tested his original
demo under a version of Neutrino.

  • Any conseqence??? <<<

Just continuing to follow our indirect way, does this all mean, that
currently the newest Neutrino kernel & the newest QNX4 kernel (4.25J)
handles
timers - in general - differently? If yes, which is the “outdated”/older
timer-handling algorithm? If it is in QNX4/Proc32 4.25J, we just have to
wait,
until the ‘newest’ Neutrino algoritm will be adapted to the QNX4/Proc32?
But, then how can it happen, that the outdated QNX4/Proc32 4.25I behaves
identically, as the most-up-to-date Neutrino kernel, but differently,
than its direct successor, 4.25J? Isn’t it a contradiction?

Or if the algorithm in QNX4/Proc32 4.25J is newer, then the one applied
in Neutrino, does it mean, that on some sunny day, algorithm of
QNX4/Proc32 4.25J will be adapted to Neutrino, and then Neutrino kernel
will produce the same symptoms as QNX4/Proc32 4.25J produces now… An
other article will born on QDN or in the programmer’s guide, that “it is
safe to use 'ticksize +/- 1’ns as a timer-cycle interval, but take care
with ‘ticksize’, exactly, because ‘some rounding issues’ or ‘some
singularity’ can come to the picture at a relatively higher statistical
rate”??? Is it really true, that we all are just waiting for that sunny
day?

If i were that alien (with our specific requirements), i think i would
say after all: “i really do not knooow”…[:-{…mostly because I would
be
puzzled by the contradictions…

But i’m not that alien - or in other words, keeping political
correctness: that alien is not me.

So if he/she finally sorts everyting (just like, finally, you & me), we
can come to a common consequence: maybe our basic assumption was wrong,
that QNX4/Proc32 4.25J works absolutely correctly…

  • Gyorgy

Randy Martin wrote:

Mario Charest <mcharest@void_zinformatic.com> wrote:

While I agree with everything you said Randy. I still don’t understand why
setting the TICKSIZE 1 then .5 then 1 ms would change how a timer
behave when the ticksize is set back to 1ms. This is HIGHLY repeatable
and not due to SMM or any other time related side effect. IMHO when
the timer is set back to 1ms , the timer should behave the same.

timers are calculated when they are created and given absolute times of
expiry, not a relative time. the kernel calculates this absolute time
for you, so if it calculates it and gets it bang on the real time rather than
slightly less (see previous posted pseudo-code) then you could be in
trouble.
and the kernel recalculates the new time when it rearms a repetivie timer.
so if there is any math rounding issues with how it does this (and it uses
current ticksize to do this) then you could be still be right on or just
after the curtime when it next fires.

bottom line is… don’t sit right at the boundary condition. sit underneath
it by a bit so that you will always fire when you expect.

or if you need to, do some work in an int handler attached to irq0. but keep
it short and simple.

[snip]

i’ll try to trim the posting a bit …

yes, i did try both your code and my code (that i use for analysis) on various
versions of Proc, starting with i and moving to the latest ‘k’, which is
in internal beta right now. i guess i assumed that you knew i would do
this.

running your code i do not see the ‘elapse’ problems that you see there.
and i’ve been through the cvs logs to see if there is a difference in the
timer handling code and there is none.
the caveats as mentioned earlier by me apply to all versions of Proc.

there are other issues here. interrupt load, cpu holdoff from other
peripherals (like video etc.)

my tests all run from text console. i do not run them in graphics mode.

let’s try the following:

use 425J
use text mode only
Proc at prio 26


i posted my test code to /usr/free … it was part of a series of notes that
i put together on using qnx4 for realtime work. that code in there uses
Trace calls for analysis so it shouldn’t be blocked on any system service
(a Trace call is a kernel call)

usr/free/qnx4/os/samples/misc/ called qnx4rtime.tgz

please try either your original test again and/or my sample code in the
above scenario.

Previously, Randy Martin wrote in qdn.public.qnx4:

i’ll try to trim the posting a bit …

(was successful.)

yes, i did try both your code and my code (that i use for analysis) on various
versions of Proc, starting with i and moving to the latest ‘k’, which is
in internal beta right now. i guess i assumed that you knew i would do
this.

running your code i do not see the ‘elapse’ problems that you see there.

(Hm…) And do you believe, that Mario - as an independent - has
also seen it?

and i’ve been through the cvs logs to see if there is a difference in the
timer handling code and there is none.

(((This can be a side effect of an other change, too.)))

the caveats as mentioned earlier by me apply to all versions of Proc.

there are other issues here. interrupt load, cpu holdoff from other
peripherals (like video etc.)

my tests all run from text console. i do not run them in graphics mode.

let’s try the following:

use 425J
use text mode only
Proc at prio 26

i was running my tests with Proc at (default) prio 30.

(maybe it is helpful: we experienced the symptom on intel
processors (Pentium MMX at 250MHz/TX chipset, CeleronA at ~410 &
466MHz/ZX chipset); i will try to test it on a newer AMD on
monday, too, because some unconfirmed info says, that it can be
un-affected by the symptom.)

i posted my test code to /usr/free … it was part of a series of notes that
i put together on using qnx4 for realtime work. that code in there uses
Trace calls for analysis so it shouldn’t be blocked on any system service
(a Trace call is a kernel call)

usr/free/qnx4/os/samples/misc/ called qnx4rtime.tgz

hm, I don’t know how can i announce it: currently (09:00 PM,
Budapest) there is no file ‘qnx4rtime.tgz’ in the referenced
directory on QUICS, and it is not mentioned in ls-lR, nor find-ls
of /usr/free. (dot)

please try either your original test again and/or my sample code in the
above scenario.

i’ll try my test in your scenario.

would it be possible to temporarily repost somewhere (compiled)
executable version of my test-code, running in your environment
(and producing no symptoms) - e.g. together with qnx4rtime.tgz?