Serious execution time jitter of thread

Hello,

serious execution time jitter of a critical process in my motion control
application was spectated. I have been able to track this problem down
to a simple test case. Please refer to:
http://www.jochenmaass.de/public/source.html

This test process consist of two threads. The main thread creates a
channel and connects to it in order to receive a pulse from a timer.
Then, this timer is created and set up to contineously deliver pulses
with a frequency of 1Khz.
Next, the second thread is created and becomes blocked on MsgReceive().
If a pulse from the timer is received, it gets unblocked and does some
calculations. When this calculations are finished it becomes blocked again.

As expected, the thread does its calculations every microsecond.
However, as you can see from the timeline provided on the website, my
thread looses the cpu to the process manager procnto for about 400us
sometimes. This happens contineously and periodically every few
seconds and is far to much time for my kind of application.

I made shure of the following so far:

  • The priority of my process is higher than any other processes’ (except
    procnto).
  • This occures with both, procnto and procnto-instr.
  • The problem is experienced on QNX 6.2.1 and QNX 6.3.0.
  • IDE bus mastering (DMA) is disabled.
  • No further self-written or any third party applications are running.

So my questions are:

  1. Did I do something wrong in the code or is QNX exspected to perform
    that way?
  2. Is there an option to prevent procnto from keeping the cpu for such a
    long period of time?

Kind regards,
Jochen

Jochen Maaß wrote:

Hello,

serious execution time jitter of a critical process in my motion control
application was spectated. I have been able to track this problem down
to a simple test case. Please refer to:
http://www.jochenmaass.de/public/source.html

This test process consist of two threads. The main thread creates a
channel and connects to it in order to receive a pulse from a timer.
Then, this timer is created and set up to contineously deliver pulses
with a frequency of 1Khz.
Next, the second thread is created and becomes blocked on MsgReceive().
If a pulse from the timer is received, it gets unblocked and does some
calculations. When this calculations are finished it becomes blocked again.

As expected, the thread does its calculations every microsecond.
However, as you can see from the timeline provided on the website, my
thread looses the cpu to the process manager procnto for about 400us
sometimes. This happens contineously and periodically every few
seconds and is far to much time for my kind of application.

I made shure of the following so far:

  • The priority of my process is higher than any other processes’ (except
    procnto).
  • This occures with both, procnto and procnto-instr.
  • The problem is experienced on QNX 6.2.1 and QNX 6.3.0.
  • IDE bus mastering (DMA) is disabled.
  • No further self-written or any third party applications are running.

So my questions are:

  1. Did I do something wrong in the code or is QNX exspected to perform
    that way?
  2. Is there an option to prevent procnto from keeping the cpu for such a
    long period of time?

Kind regards,
Jochen

Can you turn on IPC tracing and include the event types, this will help
us see what is actually going on…it looks like it the kernel is
talking to tracelogger, if this is the case it is the tracing itself
that is causing the delay. Do you see the problem when you are not
tracing. What priority is tracelogger running at?

Regards,

Joe

As noted by Joe, you are being preempted by tracelogger. Normally
tracelogger doesn’t run at such a high priority - are you running it at
a high prio? Note that it will bump it’s writer thread up higher than
the main thread.

Colin

Jochen Maaß wrote:

Hello,

serious execution time jitter of a critical process in my motion control
application was spectated. I have been able to track this problem down
to a simple test case. Please refer to:
http://www.jochenmaass.de/public/source.html

This test process consist of two threads. The main thread creates a
channel and connects to it in order to receive a pulse from a timer.
Then, this timer is created and set up to contineously deliver pulses
with a frequency of 1Khz.
Next, the second thread is created and becomes blocked on MsgReceive().
If a pulse from the timer is received, it gets unblocked and does some
calculations. When this calculations are finished it becomes blocked again.

As expected, the thread does its calculations every microsecond.
However, as you can see from the timeline provided on the website, my
thread looses the cpu to the process manager procnto for about 400us
sometimes. This happens contineously and periodically every few
seconds and is far to much time for my kind of application.

I made shure of the following so far:

  • The priority of my process is higher than any other processes’ (except
    procnto).
  • This occures with both, procnto and procnto-instr.
  • The problem is experienced on QNX 6.2.1 and QNX 6.3.0.
  • IDE bus mastering (DMA) is disabled.
  • No further self-written or any third party applications are running.

So my questions are:

  1. Did I do something wrong in the code or is QNX exspected to perform
    that way?
  2. Is there an option to prevent procnto from keeping the cpu for such a
    long period of time?

Kind regards,
Jochen


cburgess@qnx.com

One theory:
You could set the timer to an exact multiple of the base timer
cycle (996ns?) (if x86 is the target)
Rob Krten’s book has a nice explanation
http://www.parse.com/products/books/book_v1/sample.html (QNX4)
I’m sure the same principle applies for QNX6.

Basically the two timers are not perfectly equal.
At the beat frequency you will get a glitch
or jitter.
This is perfectly normal and predictable :slight_smile:

If you have to have exactly 1.0000000 kHz, you
will have to use an external timebase.

Hello Joe,

thanks for your reply. Unfortunately, this problem occurs with no
dependency to whether the tracelogger runs or not. The tracelogger in
the test case has been started with its default priority.

Nevertheless, I’ll add the IPC information to the image on the website,
thanks for this hint. The only IPC my process takes part in, though, is
to receive a pulse.

Regards,
Jochen


Can you turn on IPC tracing and include the event types, this will help
us see what is actually going on…it looks like it the kernel is
talking to tracelogger, if this is the case it is the tracing itself
that is causing the delay. Do you see the problem when you are not
tracing. What priority is tracelogger running at?

Regards,

Joe

Hello Colin,

thanks for your reply.

As noted by Joe, you are being preempted by tracelogger. Normally
tracelogger doesn’t run at such a high priority - are you running it at
a high prio? Note that it will bump it’s writer thread up higher than
the main thread.

That’s true. I.m.h.o. this is the bright green bar with its
approximately 30µs duration. However, please note that the tracelogger
itself is not serviced becoming reply blocked for aproximately 300µs,
too. Moreover, unfortunately this problem persists when tracelogger is
not running.
I’ll add the correspondent IPC calls to the website on monday. It is
probably a good idea to further investigate the problem (with your
appreciated help) according to your (and Joe’s) recommendations taking
these calls into account.

Thank you and regards,
Jochen

Hello Alex,

thanks for your reply and the recommendation of the article. It was good
reading.
I.m.h.o. the 1ms vs. 996ns relationship causes beat only. This leads to
periodically omitted pulses from the timer and is not payed attention to
since the system’s timer will be replaced by a PCI-card hosted one later.
However, my problem is not getting the cpu for about 400us while the
process is in the ready state already. Accordingly, it must have
received the pulse before.

Thanks and regards,
Jochen


Alex/Systems 104 schrieb:

One theory:
You could set the timer to an exact multiple of the base timer
cycle (996ns?) (if x86 is the target)
Rob Krten’s book has a nice explanation
http://www.parse.com/products/books/book_v1/sample.html > (QNX4)
I’m sure the same principle applies for QNX6.

Basically the two timers are not perfectly equal.
At the beat frequency you will get a glitch
or jitter.
This is perfectly normal and predictable > :slight_smile:

If you have to have exactly 1.0000000 kHz, you
will have to use an external timebase.

Jochen Maaß wrote:

since the system’s timer will be replaced by a PCI-card hosted one later.
However, my problem is not getting the cpu for about 400us while the

400us jitter on an OS timer is believable. It’s not like you are directly attached to IRQ#0 itself and, besides, QNX6 has IRQ#0 set to the lowest priority. It will be good to see what QSS has to say about this but I won’t be getting too excited either way.

If you want to better simulate an external timer tick then use the RTC’s alarm. It can be setup up to 8192 Hz and fires away on it’s very own IRQ#8.

Example code provided in this group under the title “RTC Access (time, source)”.


Evan

400us jitter on an OS timer is believable. It’s not like you are
directly attached to IRQ#0 itself and, besides, QNX6 has IRQ#0 set to
the lowest priority.

Hello Evan,

thank you for your reply, your suggestion may be the key to solve the
problem. Is it correct, that if I attach to IRQ#8 my process inherits
the high priority from the associated ISR? That would stop cyclic
procnto tasks to become privileged to my process.

Where can I get more information about the IRQs’ priorities and to which
IRQs the kernel (procnto, respectively) is attached to?

Thanks and regards
Jochen

Hi,

just an other possibilty: is the SMI very active on your system??

Regards

Armin


Jochen Maaß wrote:

Hello,

serious execution time jitter of a critical process in my motion control
application was spectated. I have been able to track this problem down
to a simple test case. Please refer to:
http://www.jochenmaass.de/public/source.html

This test process consist of two threads. The main thread creates a
channel and connects to it in order to receive a pulse from a timer.
Then, this timer is created and set up to contineously deliver pulses
with a frequency of 1Khz.
Next, the second thread is created and becomes blocked on MsgReceive().
If a pulse from the timer is received, it gets unblocked and does some
calculations. When this calculations are finished it becomes blocked again.

As expected, the thread does its calculations every microsecond.
However, as you can see from the timeline provided on the website, my
thread looses the cpu to the process manager procnto for about 400us
sometimes. This happens contineously and periodically every few
seconds and is far to much time for my kind of application.

I made shure of the following so far:

  • The priority of my process is higher than any other processes’ (except
    procnto).
  • This occures with both, procnto and procnto-instr.
  • The problem is experienced on QNX 6.2.1 and QNX 6.3.0.
  • IDE bus mastering (DMA) is disabled.
  • No further self-written or any third party applications are running.

So my questions are:

  1. Did I do something wrong in the code or is QNX exspected to perform
    that way?
  2. Is there an option to prevent procnto from keeping the cpu for such a
    long period of time?

Kind regards,
Jochen

just an other possibilty: is the SMI very active on your system??

Hello Armin,

I apologize since I have to ask: What does SMI mean?

Thanks in advance and regards,

Jochen

Jochen Maaß wrote:

just an other possibilty: is the SMI very active on your system??


Hello Armin,

I apologize since I have to ask: What does SMI mean?

I believe they’re call-outs to the BIOS setup by the BIOS and supported in hardware by the CPU and prolly the northbridge too. Consider them an NMI with a hard wired processor mode switch. The WinTel community thought it was a brainy thing to do.

Typically, it emulates legacy hardware no longer present in the chipset and all those nifty little trinkets that laptops have slapped on them and things like automatic power saving mode switches will also be slipped in there.

These are nastie things in a real-time system as, I suspect, they perform an operation similar to a “Suspend To RAM” for each event.

Any PC chipset that touts itself as low power is a good candidate for making heavy use of SMI.


Evan

Hello, Evan!

Thank you, that was really helpful. I’ll check this up due to the fact
that the BIOS settings are still in the off-the-shelf state.

Regards,

Jochen


I apologize since I have to ask: What does SMI mean?


I believe they’re call-outs to the BIOS setup by the BIOS and supported
in hardware by the CPU and prolly the northbridge too. Consider them an
NMI with a hard wired processor mode switch. The WinTel community
thought it was a brainy thing to do.

Typically, it emulates legacy hardware no longer present in the chipset
and all those nifty little trinkets that laptops have slapped on them
and things like automatic power saving mode switches will also be
slipped in there.

These are nastie things in a real-time system as, I suspect, they
perform an operation similar to a “Suspend To RAM” for each event.

Any PC chipset that touts itself as low power is a good candidate for
making heavy use of SMI.


Evan

Jochen Maaß wrote:

Where can I get more information about the IRQs’ priorities and to which
IRQs the kernel (procnto, respectively) is attached to?

Good question … :wink:


Evan

Jochen Maaß wrote:

just an other possibilty: is the SMI very active on your system??


Hello Armin,

I apologize since I have to ask: What does SMI mean?

The SMI is the System Management Interrupt … it is the toplevel
non-interruptable interrupt for power managment and other tasks.
When the SMI handler is working … the CPU is completely out of control
of the RTOS.

Please try to switch off the power management functions as far as
possible (not for the CPU).


Regards

Armin


Thanks in advance and regards,

Jochen

Jochen Maaß wrote:

Hello Colin,

thanks for your reply.

As noted by Joe, you are being preempted by tracelogger. Normally
tracelogger doesn’t run at such a high priority - are you running it
at a high prio? Note that it will bump it’s writer thread up higher
than the main thread.


That’s true. I.m.h.o. this is the bright green bar with its
approximately 30µs duration. However, please note that the tracelogger
itself is not serviced becoming reply blocked for aproximately 300µs,
too.

Yes, it is reply blocked on procnto (writing to /dev/shmem)

Moreover, unfortunately this problem persists when tracelogger is
not running.
I’ll add the correspondent IPC calls to the website on monday. It is
probably a good idea to further investigate the problem (with your
appreciated help) according to your (and Joe’s) recommendations taking
these calls into account.

Thank you and regards,
Jochen


cburgess@qnx.com

On Sun, 03 Apr 2005 13:19:19 +1200, Evan Hillas <evanh@clear.net.nz> wrote:

400us jitter on an OS timer is believable. It’s not like you are

This is QNX, 1ms with minimal jitter on a reasonably fast system on
otherwise
idle system must be achieved.

Maybe you had Windows in mind…


Using Opera’s revolutionary e-mail client: http://www.opera.com/m2/

On Sat, 02 Apr 2005 17:02:35 +0200, Jochen Maaß <j.maass@tu-bs.de> wrote:

Hello Alex,

thanks for your reply and the recommendation of the article. It was good
reading.
I.m.h.o. the 1ms vs. 996ns relationship causes beat only. This leads to
periodically omitted pulses from the timer and is not payed attention to
since the system’s timer will be replaced by a PCI-card hosted one later.
However, my problem is not getting the cpu for about 400us while the
process is in the ready state already. Accordingly, it must have
received the pulse before.

Humour me and try your code modified as follows:

Add this:
struct timespec res;
if ( clock_getres( CLOCK_REALTIME, &res) == -1 ) {
perror( “clock get resolution” );
}
printf( “Resolution is %ld nano seconds.\n”,
res.tv_nsec);

and change
itime.it_interval.tv_nsec = 1000000;
to:
itime.it_interval.tv_nsec = res.tv_nsec;

On a PC res.tv_nsec is 999847 ns which is about the same
as your 1ms.

See if you get the same result :wink:

Specifying 1ms, you will miss a 1 ms cycle every time
the difference between your period and the internal
timer’s approximation of that period exceeds the timer resolution:
In this case 1ms / 153ns = 6.535 s

Or using the beat frequency method:
1khz - 1/0.000999847 = 0.153… which is a period of 6.53s

Evan Hillas wrote:

Jochen Maaß wrote:

since the system’s timer will be replaced by a PCI-card hosted one later.
However, my problem is not getting the cpu for about 400us while the


400us jitter on an OS timer is believable. It’s not like you are
directly attached to IRQ#0 itself and, besides, QNX6 has IRQ#0 set to
the lowest priority. It will be good to see what QSS has to say about
this but I won’t be getting too excited either way.

If you have 6.3 you can change this by using -I0 to startup-bios to set
IRQ#0 as top prio in the 8259 PIC.

The side effect is that everything hung on the slave 8259 (ie irq# 8-15)
now are 3 in line since priorities are set 0,1,2 (cascade to slave),
3,4,5,6,7 rather than 3,4,5,6,7,0,1,2 by default.


Cheers,
Adam

QNX Software Systems
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Alex/Systems 104 wrote:

On Sun, 03 Apr 2005 13:19:19 +1200, Evan Hillas <> evanh@clear.net.nz> > wrote:

400us jitter on an OS timer is believable. It’s not like you are


This is QNX, 1ms with minimal jitter on a reasonably fast system on
otherwise
idle system must be achieved.

I don’t know speed of Jochen’s CPU but I know I got at least 40us peak jitter on my 500MHz K6-2 using InterruptAttach(), the average was more like 5us. I can imagine a slower PC using an OS timer would approach the 400us mark no trouble.

The real cause seems to be just the instrumented kernel doing it’s job. An ISR would most likely have priority over such logging so having the critical code in an ISR would be a temporary work-around or even final solution depending on how comfortable the project is with such coding.


Evan