Serious execution time jitter of thread

Jochen_Maass · April 5, 2005, 2:44pm

Hello Evan,

I don’t know speed of Jochen’s CPU but I know I got at least 40us peak
jitter on my 500MHz K6-2 using InterruptAttach(), the average was more
like 5us. I can imagine a slower PC using an OS timer would approach
the 400us mark no trouble.

The process under test is running on an AMD64-3000+ (!).

An ISR would most likely have priority over such logging so having the
critical code in an ISR would be a temporary work-around or even final
solution depending on how comfortable the project is with such coding.

We are working on that, good recommendation. Tests are in progress …

Thanks and regards,

Jochen

Mario_Charest1 · April 5, 2005, 2:50pm

“Jochen Maaß” <j.maass@tu-bs.de> wrote in message
news:d2pkoi$7te$3@inn.qnx.com…

Hello, Evan!

Thank you, that was really helpful. I’ll check this up due to the fact
that the BIOS settings are still in the off-the-shelf state.

Very often this is not a BIOS setting, it unavailable. Turning power
management off is usually not enough.

Regards,

Jochen

I apologize since I have to ask: What does SMI mean?

I believe they’re call-outs to the BIOS setup by the BIOS and supported
in hardware by the CPU and prolly the northbridge too. Consider them an
NMI with a hard wired processor mode switch. The WinTel community
thought it was a brainy thing to do.

Typically, it emulates legacy hardware no longer present in the chipset
and all those nifty little trinkets that laptops have slapped on them and
things like automatic power saving mode switches will also be slipped in
there.

These are nastie things in a real-time system as, I suspect, they perform
an operation similar to a “Suspend To RAM” for each event.

Any PC chipset that touts itself as low power is a good candidate for
making heavy use of SMI.

Evan

Mario_Charest1 · April 5, 2005, 5:08pm

“Jochen Maaß” <j.maass@tu-bs.de> wrote in message
news:d2u80o$lpl$1@inn.qnx.com…

Hello Evan,

I don’t know speed of Jochen’s CPU but I know I got at least 40us peak
jitter on my 500MHz K6-2 using InterruptAttach(), the average was more
like 5us. I can imagine a slower PC using an OS timer would approach the
400us mark no trouble.

The process under test is running on an AMD64-3000+ (!).

Then it most likely has SMI, unless it’s some special embedded board.

An ISR would most likely have priority over such logging so having the
critical code in an ISR would be a temporary work-around or even final
solution depending on how comfortable the project is with such coding.

We are working on that, good recommendation. Tests are in progress …

Thanks and regards,

Jochen

Alex_Systems_1041 · April 5, 2005, 5:25pm

On Tue, 05 Apr 2005 09:33:00 +1200, Evan Hillas <evanh@clear.net.nz> wrote:

I don’t know speed of Jochen’s CPU but I know I got at least 40us peak
jitter on my 500MHz K6-2 using InterruptAttach(), the average was more
like 5us. I can imagine a slower PC using an OS timer would approach
the 400us mark no trouble.

The real cause seems to be just the instrumented kernel doing it’s job.
An ISR would most likely have priority over such logging so having the
critical code in an ISR would be a temporary work-around or even final
solution depending on how comfortable the project is with such coding.

Well, put it this way:
I tried the code (without the instrumented kernel) on a
PC104 Pentium I 166 MHz.

Added some code to toggle a printer port data line
every time the pulse was received, put a logic analyser on it.

Pulse train of 500 Hz is seen. (1ms high, 1ms low)
(With only a few microseconds jitter either way, by the way,
just as expected).

Set triggering to detect a low level for >1 ms.

With the original code, analyzer triggers every few seonds.

With the modified code, triggers … never.

There may be other things happening on the specific
hardware and other processes running e.g. tracelogger,
but with the pulse delivery mechanism on an unloaded
properly configured system, there is nothing that can
cause 400 us jitter.

I would first remove this obvious reason for the pulse being
“lost”, & see if the OS still goes away for 400us.

If it is SMI, even hooking to an IRQ will not help things…
Instrumented kernel will have been written very carefully
to minimise affecting the sysem under test, otherwise it will
be of limited usefulness.

I just don’t want this thread to stand unchallenged
that the OS can just go away for 400 us for no reason
with a simple 2 thread program as shown.

Been burned by this before…

Evan_Hillas1 · April 5, 2005, 9:28pm

Alex/Systems 104 wrote:

I just don’t want this thread to stand unchallenged
that the OS can just go away for 400 us for no reason
with a simple 2 thread program as shown.

Been burned by this before…

Cool. Sounds promising. Lets see what Jochen finds …

Evan

Evan_Hillas1 · April 5, 2005, 9:37pm

Jochen Maaß wrote:

An ISR would most likely have priority over such logging so having the
critical code in an ISR would be a temporary work-around or even final
solution depending on how comfortable the project is with such coding.

We are working on that, good recommendation. Tests are in progress …

A quicker approach to tracing this might be to temporarily not use the instrumented kernel and, inline with your existing code, take snaps of the processor cycle counter using ClockCycles(). Run that for a while then exit dumping all the snaps to a file and go through the file looking at the diffs between them all to see if anything looks bad.

Evan

Jochen_Maass · April 6, 2005, 7:53am

Hello Evan,

A quicker approach to tracing this might be to temporarily not use the
instrumented kernel and, inline with your existing code, take snaps of
the processor cycle counter using ClockCycles(). Run that for a while
then exit dumping all the snaps to a file and go through the file
looking at the diffs between them all to see if anything looks bad.

by doing exactly that, we became aware of the problem.

Thanks and regards,
Jochen

Jochen_Maass · April 6, 2005, 7:58am

Hello Alex,

thank you very much for your effort and time! Our tests are still
running, I’ll post the results here soon.

Regards,
Jochen

Alex/Systems 104 schrieb:

On Tue, 05 Apr 2005 09:33:00 +1200, Evan Hillas <> evanh@clear.net.nz> > wrote:

I don’t know speed of Jochen’s CPU but I know I got at least 40us
peak jitter on my 500MHz K6-2 using InterruptAttach(), the average
was more like 5us. I can imagine a slower PC using an OS timer would
approach the 400us mark no trouble.

The real cause seems to be just the instrumented kernel doing it’s
job. An ISR would most likely have priority over such logging so
having the critical code in an ISR would be a temporary work-around
or even final solution depending on how comfortable the project is
with such coding.

Well, put it this way:
I tried the code (without the instrumented kernel) on a
PC104 Pentium I 166 MHz.

Added some code to toggle a printer port data line
every time the pulse was received, put a logic analyser on it.

Pulse train of 500 Hz is seen. (1ms high, 1ms low)
(With only a few microseconds jitter either way, by the way,
just as expected).

Set triggering to detect a low level for >1 ms.

With the original code, analyzer triggers every few seonds.

With the modified code, triggers … never.

There may be other things happening on the specific
hardware and other processes running e.g. tracelogger,
but with the pulse delivery mechanism on an unloaded
properly configured system, there is nothing that can
cause 400 us jitter.

I would first remove this obvious reason for the pulse being
“lost”, & see if the OS still goes away for 400us.

If it is SMI, even hooking to an IRQ will not help things…
Instrumented kernel will have been written very carefully
to minimise affecting the sysem under test, otherwise it will
be of limited usefulness.

I just don’t want this thread to stand unchallenged
that the OS can just go away for 400 us for no reason
with a simple 2 thread program as shown.

Been burned by this before…

Jochen_Maass · April 6, 2005, 11:15am

Hello everybody,

sorry for beeing that quiet during the last days. We put up several test
cases and the ones without instrumented kernel had been running for 12h
each, to become convinced of the results.

The problem disappeared as soon as we disabled the support for USB
keyboard and mouse from the BIOS. Apparently the SMI service routine is
privileged to the OS.

Now the worst case execution time is only about 6us longer than the
average time. This definitely meets my requirements. I also took Alex’s
recommendation into account. It didn’t solve the problem, but now
nevertheless the timer is beautifully aligned to the system’s clock
cycle and my process doesn’t get preempted by the tracelogger any longer.

I’d like to thank everybody here for granted help and efforts,
especially Alex for running my test case on his own hardware. This
indeed is a great community. Maybe I can contribute to solve other’s
problems as well when I’ve become more experienced with QNX.

Regards,
Jochen

Alex_Systems_1041 · April 6, 2005, 12:51pm

On Wed, 06 Apr 2005 13:15:19 +0200, Jochen Maaß <j.maass@tu-bs.de> wrote:
…

And we all learned something in the process!

Evan_Hillas1 · April 6, 2005, 9:19pm

Alex/Systems 104 wrote:

On Wed, 06 Apr 2005 13:15:19 +0200, Jochen Maaß <> j.maass@tu-bs.de> > wrote:
…

And we all learned something in the process!

And great news that instrumented kernel is keeping it’s head low.

Evan