Long interrupt latency

The QNX6 kernel is a fully preemptive kernel … so you should expect
a better context switching.

The everage interrupt latency (200 MHz CPU!) is 1.7 us > :slight_smile:

I think that someone doing hard realtime doesn’t care about average latency.
They want to know MAXIMUM latency.

Yes, but maximum latency is also going to include many factors outside of
the control of QNX. The only numbers QNX can really provide are best-case
and worst-case-by-situation. As an example, SMM on an x86 can block out any
OS for long periods of time - nothing you can do about it - and this will
make the worst-case look really bad (probably on the order of ms).

Personally I like to know best-case since it can’t get any faster then that
(good to know) and also various numbers in various, detailed, situations.
The details are key for these values. They are what will help someone looking
at a system figure out if that system can meet thier needs. As an example,
if you are making an embedded system with a very high interrupt rate
(when processing) but you know that is the only thing running in the system
then best-case is going to be key.

chris

\

Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Rennie Allen” <rallen@csical.com> wrote in message
news:3C8D1AE5.9050602@csical.com

Mario Charest wrote:

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3C8D0688.609@csical.com> …

I see your point, but I don’t agree. If I have a calculator that only
supports

  • and - and I want to add support for * and /, I don’t see why
    supporting
  • and / would slow down + and -.


    This is a poor analogy.

The scheduler is entered via an event of some sort (kernel call,
interrupt) and then does some processing to figure out what is going to
be scheduled. If you make those decisions more complex, then you
lengthen the path (this is not akin to creating a totally new, and
completely independant path - as in you analogy).

One obvious mechanism that the QNX6 kernel supports, that the QNX4
kernel didn’t, is a more sophisticated “proxy” mechanism (i.e. events).
It doesn’t take much imagination to to see how the scheduling work
that follows an interrupt which dispatches a QNX6 pulse would take more
time than the scheduling work that follows an interrupt dispatching a
QNX4 proxy (e.g. the fact that a pulse payload can change every time
means that at the very least, the compression algorithm must be more
complex - or not exist at all in favor of queueing everything).

Ha, finaly facts! Proxy have payload that can change, see that I can buy…
However I fail to see how this has anything to do with interrupt lattency.

With good design (which I am sure exists in the QNX6 kernel) you can do
a lot to mitigate the effect of this new code, and you may even be able
to make your average latency as good as the old scheduler, but the worst
case latency is going to be impacted since (by definition) it involves
executing all of the new code.

Rennie

“Mario Charest” <goto@nothingness.com> wrote in message
news:a6kqh9$93r$1@inn.qnx.com

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3C8D1AE5.9050602@csical.com> …
Mario Charest wrote:

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3C8D0688.609@csical.com> …

I see your point, but I don’t agree. If I have a calculator that only
supports

  • and - and I want to add support for * and /, I don’t see why
    supporting
  • and / would slow down + and -.


    This is a poor analogy.

The scheduler is entered via an event of some sort (kernel call,
interrupt) and then does some processing to figure out what is going to
be scheduled. If you make those decisions more complex, then you
lengthen the path (this is not akin to creating a totally new, and
completely independant path - as in you analogy).

One obvious mechanism that the QNX6 kernel supports, that the QNX4
kernel didn’t, is a more sophisticated “proxy” mechanism (i.e. events).
It doesn’t take much imagination to to see how the scheduling work
that follows an interrupt which dispatches a QNX6 pulse would take more
time than the scheduling work that follows an interrupt dispatching a
QNX4 proxy (e.g. the fact that a pulse payload can change every time
means that at the very least, the compression algorithm must be more
complex - or not exist at all in favor of queueing everything).

Ha, finaly facts! Proxy have payload that can change, see that I can
buy…
However I fail to see how this has anything to do with interrupt
lattency.

I can jump in here with some personal experience. We were trying to debug a

process that was running multiple concurrent timers. If we left it at a
breakpoint too long, it appeared to hang the system. The QSSL explanation
was that each timer expiration generates an event, that the undelivered
events needed to be kept in multiple sorted lists, and that when the
multiple lists became too full the kernel processing used all the CPU
cycles.

\

Marty Doane
Siemens Dematic

Marty Doane <marty.doane@rapistan.com> wrote:

I can jump in here with some personal experience. We were trying to debug a
process that was running multiple concurrent timers. If we left it at a
breakpoint too long, it appeared to hang the system. The QSSL explanation
was that each timer expiration generates an event, that the undelivered
events needed to be kept in multiple sorted lists, and that when the
multiple lists became too full the kernel processing used all the CPU
cycles.

The worst case is multiple concurrent timers at the same priority.
If the events are at a different priority, similar events can be
“condensed” in the queue – up to 255 indentical pulses can be
condensed in the same queue entry. But, with different pulses at
the same priority, to guarantee delivery in the order generated, they
have to each have an individual queue entry.

I seem to remember the system where I heard about this (possibly yours)
was using close to 1ms frequency for those pulses. So, you are essentially
(with 2 pulses) generating 2000 queue entries per second. Then, for each
new pulse (at 2000 per second) your adding it to a queue that is 2000 entries
long, 4000 entries long, etc.

If you don’t drain the queue… you can pretty quickly have a queue that is
10,000s or even 100,000s of entry deep. (5 seconds gets you 10,000, a
minute pushes you over 100,000 entries). Yup, this can be hard on your
system.

With pulses that can be compressed, the issue can still happen – but it
takes 255 times as long.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Mario Charest wrote:

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3C8D1AE5.9050602@csical.com> …

Mario Charest wrote:



Ha, finaly facts! Proxy have payload that can change, see that I can buy…
However I fail to see how this has anything to do with interrupt lattency.

Your right of course, this would affect scheduling latency not interrupt
latency; but since all good rt programmers design all of their interrupt
handlers as schedulable entities, this number is at least as significant
to those doing interrupt handlers, as interrupt latency is, correct ?
:wink: This is just one example of how doing more in the kernel can be
detrimental to latency (this isn’t necessarily “bad” - I quite prefer
pulses over proxies, I just don’t expect to get them for free).

Rennie

Chris McKillop wrote:


Yes, but maximum latency is also going to include many factors outside of
the control of QNX. The only numbers QNX can really provide are best-case
and worst-case-by-situation. As an example, SMM on an x86 can block out any
OS for long periods of time - nothing you can do about it - and this will
make the worst-case look really bad (probably on the order of ms).

This is true, and is why it is impossible to create meaningful numbers
unless you do the testing yourself on your own hardware. The bottom
line is, if you have a hard-rt app that has a deadline shorter than the
worst case SMI latency (which as you state is very long), then you
simply can’t run your application on this hardware. Of course, just to
keep things on course with the original posters question, SMM is not a
factor in his case since, he was achieving better numbers on QNX4 with
identical hardware.


Personally I like to know best-case since it can’t get any faster then that
(good to know) and also various numbers in various, detailed, situations.
The details are key for these values. They are what will help someone looking
at a system figure out if that system can meet thier needs.

I don’t expect QSSL to produce numbers with every combination of
hardware that’s supported by QNX, since you would have to test hundreds
of combinations many of which might never be used by anyone (and after
all, us users have to do something to earn our keep :slight_smile:

btw: Chris, if you have access to email, could you drop me a line and
let me know where QNX Night is happening on Thursday ?

Rennie

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:a6latr$8p9$1@nntp.qnx.com

Marty Doane <> marty.doane@rapistan.com> > wrote:

I can jump in here with some personal experience. We were trying to
debug a
process that was running multiple concurrent timers. If we left it at a
breakpoint too long, it appeared to hang the system. The QSSL
explanation
was that each timer expiration generates an event, that the undelivered
events needed to be kept in multiple sorted lists, and that when the
multiple lists became too full the kernel processing used all the CPU
cycles.

The worst case is multiple concurrent timers at the same priority.
If the events are at a different priority, similar events can be
“condensed” in the queue – up to 255 indentical pulses can be
condensed in the same queue entry. But, with different pulses at
the same priority, to guarantee delivery in the order generated, they
have to each have an individual queue entry.

I seem to remember the system where I heard about this (possibly yours)
was using close to 1ms frequency for those pulses. So, you are
essentially
(with 2 pulses) generating 2000 queue entries per second. Then, for each
new pulse (at 2000 per second) your adding it to a queue that is 2000
entries
long, 4000 entries long, etc.

If you don’t drain the queue… you can pretty quickly have a queue that
is
10,000s or even 100,000s of entry deep. (5 seconds gets you 10,000, a
minute pushes you over 100,000 entries). Yup, this can be hard on your
system.

With pulses that can be compressed, the issue can still happen – but it
takes 255 times as long.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Yes, I think it was my situation that you remember. We’ve learned not to do
that, of course, but this seemed to be quite a good example of how a
difference between QNX 4 and QNX 6 could affect the interrupt/scheduling
latency.

Marty Doane
Siemens Dematic

David Gibbs <dagibbs@qnx.com> wrote:

Marty Doane <> marty.doane@rapistan.com> > wrote:

I seem to remember the system where I heard about this (possibly yours)
was using close to 1ms frequency for those pulses. So, you are essentially
(with 2 pulses) generating 2000 queue entries per second. Then, for each
new pulse (at 2000 per second) your adding it to a queue that is 2000 entries
long, 4000 entries long, etc.

If you don’t drain the queue… you can pretty quickly have a queue that is
10,000s or even 100,000s of entry deep. (5 seconds gets you 10,000, a
minute pushes you over 100,000 entries). Yup, this can be hard on your
system.

I’ve issued a PR 10963 describing this unfortunate behaviour.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Rennie Allen wrote:

Mario Charest wrote:

“Rennie Allen” <> rallen@csical.com> > wrote in message

I am not trying to worry you. I think you can definately do much better
than 120usec, but QNX4 is a simpler kernel, designed for x86 only, to
expect the much more sophisticated Neutrino kernel to perform as well as
QNX4 is a little unrealistic.


Why?

Simply, if the kernel does more work, the longest path through the
kernel is certainly going to be longer.

Any execution of a longer path can be preempted …

Armin

Please forgive if this winds up being a duplicate reply- I replied and didnt
see it for 12 hours.

I am measuring latency from the time the interrupt request line is asserted
on the ISA bus until the time my interrupt service routine is entered. I
thought the only priority that would affect this is the priority of the
thread containing the interrupt service routine.


“Chris McKillop” <cdm@qnx.com> wrote in message
news:a6f4k1$ik4$1@nntp.qnx.com

Art Hays <> avhays@nih.gov> > wrote:
The interrupt thread is running at pri 25. This is the highest on my
system.


And what priority do you have the pulse set to in the sigevent structure?

chris


Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Art Hays <avhays@nih.gov> wrote:

Please forgive if this winds up being a duplicate reply- I replied and didnt
see it for 12 hours.

I am measuring latency from the time the interrupt request line is asserted
on the ISA bus until the time my interrupt service routine is entered. I
thought the only priority that would affect this is the priority of the
thread containing the interrupt service routine.

ANd when you say “interrupt service routine” I assume you mean the callback
function you pass into InterruptAttach()? Are you using a scope to measure
this timing?

chris

Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Chris McKillop” <cdm@qnx.com> wrote in message
news:a6mvm3$g9o$1@nntp.qnx.com

Art Hays <> avhays@nih.gov> > wrote:
Please forgive if this winds up being a duplicate reply- I replied and
didnt
see it for 12 hours.

I am measuring latency from the time the interrupt request line is
asserted
on the ISA bus until the time my interrupt service routine is entered.
I
thought the only priority that would affect this is the priority of the
thread containing the interrupt service routine.


ANd when you say “interrupt service routine” I assume you mean the
callback
function you pass into InterruptAttach()? Are you using a scope to
measure
this timing?

Yes, that’s correct. I set a bit first thing in the interrupt service
routine that I can see with a scope. Actually, this isnt even necessary to
tell the latency- IRQ10 on the bus (this is the interrupt line that my a/d
is using) stays asserted until the interrupt is acknowledged, so you can
determine the latency from the length of assertion of IRQ10. I am looking
at both signals. I have a ‘tiff’ picture of the scope display that I could
email to you if you would like to see the actual signals.

Since the latency is so long, and no one has posted that there is a problem
with the rage driver, I’m beginning to think I’m doing something wrong. I
just dont know what yet.

Art

“Art Hays” <avhays@nih.gov> wrote in news:a6n225$qen$1@inn.qnx.com:

Yes, that’s correct. I set a bit first thing in the interrupt service
routine that I can see with a scope. Actually, this isnt even
necessary to tell the latency- IRQ10 on the bus (this is the interrupt
line that my a/d is using) stays asserted until the interrupt is
acknowledged, so you can determine the latency from the length of
assertion of IRQ10. I am looking at both signals. I have a ‘tiff’
picture of the scope display that I could email to you if you would
like to see the actual signals.

Since the latency is so long, and no one has posted that there is a
problem with the rage driver, I’m beginning to think I’m doing
something wrong. I just dont know what yet.

I’ve seen cases of video cards, attempting to become bus masters either by
design or in combination with the MB chipset, and locking out the bus for a
while (over and over). What does the latency look like for other
interrupts - do you see the same latency affecting everyone or just your
ISR?

\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Art Hays <avhays@nih.gov> wrote:

Please forgive if this winds up being a duplicate reply- I replied and didnt
see it for 12 hours.

I am measuring latency from the time the interrupt request line is asserted
on the ISA bus until the time my interrupt service routine is entered. I
thought the only priority that would affect this is the priority of the
thread containing the interrupt service routine.

Actually, the priority of the thread that attached the irq service
routine is irrelevant at that point. (And, threads don’t contain
code, the irq service routine is code, it is contained by the process
that also contains the thread that attached the irq service routine.)

Now, interrupts do have priorities – this is a hardware side issue, and
depends on the hardware you are on. ISA suggests x86, so I THINK the
default priority ordering for interrupts is irq 0 highest, down to 7
as lowest, with all the 2nd bank coming in at irq 2, with 8 being
highest down to 15 being lowest. That is: 0,1,2 [8-15],3,4-7.

Anybody doing heavy bus activity (bus-master DMA for instance) can delay
the recognition or handling of an irq, any higher-priority irq can delay,
any process that disables interrupts can, any process that masks that
particular interrupt can, the kernel can.

Some of those problems are under control of QSSL, others are not.

Where the priority of the isr thread, or the pulse you return if you
return a pulse start to affect things is in scheduling latency – that
is the time from the end of the irq routine to the time when your thread
starts to deal with it.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

“David Gibbs” <dagibbs@qnx.com> wrote in message news:a6o53m$dtq$2@nntp.qnx.com

Art Hays <> avhays@nih.gov> > wrote:
Please forgive if this winds up being a duplicate reply- I replied and didnt
see it for 12 hours.

I am measuring latency from the time the interrupt request line is asserted
on the ISA bus until the time my interrupt service routine is entered. I
thought the only priority that would affect this is the priority of the
thread containing the interrupt service routine.

Actually, the priority of the thread that attached the irq service
routine is irrelevant at that point. (And, threads don’t contain
code, the irq service routine is code, it is contained by the process
that also contains the thread that attached the irq service routine.)

Thanks for illuminating this.

Now, interrupts do have priorities – this is a hardware side issue, and
depends on the hardware you are on. ISA suggests x86, so I THINK the
default priority ordering for interrupts is irq 0 highest, down to 7
as lowest, with all the 2nd bank coming in at irq 2, with 8 being
highest down to 15 being lowest. That is: 0,1,2 [8-15],3,4-7.

Anybody doing heavy bus activity (bus-master DMA for instance) can delay
the recognition or handling of an irq, any higher-priority irq can delay,
any process that disables interrupts can, any process that masks that
particular interrupt can, the kernel can.

Which of the above do you think is happening here? My concern is that it’s number 3, “any process
that disables interrupts” and the process that’s doing it is the rage driver. I’m still looking for
something I’m doing wrong on my end, however. I am going to write a simple program that just
starts the a/d and catches it’s interrupt in a short ISR. I will see if this behaves any differently than my
full application.

Some of those problems are under control of QSSL, others are not.

Where the priority of the isr thread, or the pulse you return if you
return a pulse start to affect things is in scheduling latency – that
is the time from the end of the irq routine to the time when your thread
starts to deal with it.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Art Hays wrote:


Which of the above do you think is happening here? My concern is that it’s number 3, “any process
that disables interrupts” and the process that’s doing it is the rage driver. I’m still looking for
something I’m doing wrong on my end, however. I am going to write a simple program that just
starts the a/d and catches it’s interrupt in a short ISR. I will see if this behaves any differently than my
full application.

One way to do tell if the rage driver is responsible, is to take out the
video card entirely, and run your app through a telnet session or modem
session (this isn’t a solution, but it would prove whether the rage
driver is responsible).

Rennie

“Rennie Allen” <rallen@csical.com> wrote in message news:3C8FC128.2000007@csical.com

Art Hays wrote:


Which of the above do you think is happening here? My concern is that it’s number 3,
“any process
that disables interrupts” and the process that’s doing it is the rage driver. I’m
still looking for
something I’m doing wrong on my end, however. I am going to write a simple program
that just
starts the a/d and catches it’s interrupt in a short ISR. I will see if this behaves
any differently than my
full application.


One way to do tell if the rage driver is responsible, is to take out the
video card entirely, and run your app through a telnet session or modem
session (this isn’t a solution, but it would prove whether the rage
driver is responsible).

Since I dont see the long latency at all unless I scroll a window in the help viewer, I
dont see how I could
cause the problem unless there was an operational display.

Rennie

Art Hays wrote:

Since I dont see the long latency at all unless I scroll a window in the help viewer, I
dont see how I could
cause the problem unless there was an operational display.

Then I think you have your answer…


Rennie

Rennie Allen wrote:

Art Hays wrote:


Since I dont see the long latency at all unless I scroll a window in the help viewer, I
dont see how I could
cause the problem unless there was an operational display.

Then I think you have your answer…

… and the remaining question why the Rage 128 driver kills the
realtime performance of QNX6.
Is it the driver? Is it a burst transmission of data at the PCI/AGP bus
??
What are the reasons ???


Armin

“Armin Steinhoff” <a-steinhoff@web_.de> wrote in message
news:3C908C56.73D4EE03@web_.de…

Rennie Allen wrote:

Art Hays wrote:


Since I dont see the long latency at all unless I scroll a window in
the help viewer, I
dont see how I could
cause the problem unless there was an operational display.

Then I think you have your answer…

… and the remaining question why the Rage 128 driver kills the
realtime performance of QNX6.
Is it the driver? Is it a burst transmission of data at the PCI/AGP bus
??
What are the reasons ???

Yeah, mainly when there is no such problem under QNX4.

Armin