TCPIP Performance

Igor Kovalenko <kovalenko@attbi.com> wrote:

“Brian Stecher” <> bstecher@qnx.com> > wrote in message
news:b1e2f7$s9t$> 1@nntp.qnx.com> …
Nope. Event delivery & scheduling decisions caused by the event will
happen before code from any user level thread is executed - non-SMP.
In the SMP case things are a little more complicated, but the event
delivery and rescheduling will certainly take place before the next
clock tick.

What if a RR thread has its timeslice about to expire? To hold the above
statement true (and not postpone the scheduling) you have to be able to
complete delivery of ALL queued pulses with priority >= that thread within a
timeframe less than 1 TICK long. I am very curious, how that is done. From a
purely logical perspective, how do you maintain a queue of asynchronous
events without running the producer and consumer asynchronously (that is, at
potentially different pace)?

The queue CAN be indefinitely long, right? In fact it could be full of
pulses with even higher priority already before the pulse in question is
sent. And if you were delivering ALL eligible pulses before the next TICK,
that COULD potentially take indefinitely long time? Then there has to be a
bound on how many events can be delivered ‘per pass’. If there is, then
there got to be a potential latency, at least in a worst case. For some
people I know the word ‘queue’ is just another word for ‘latency’. I don’t
like poking in the dark like this, but docs are rather scarce on the
subject.

The docs are scarce because this is an implementation detail that can
change from release to release. Ya gotta leave a little mystery in
life or where’s the fun :slight_smile:.

Anyway, Right now, if an interrupt that the kernel needs to be delivered
and it recognizes that the kernel data structures may not be in a consistent
state, it places the event on an “interrupt pending queue” (if the
kernel state is known to be consistent the event is delivered right
away). At various known “good” points during kernel execution the state
of that queue is checked (e.g. just before we transfer control back
to a user thread) and, if need be, the queue is drained and the
events delivered. That’s what I meant in my original message. In
the case of a pulse, if there is a thread received blocked on the
channel, the pulse will immediately be delivered and the thread
readied (and made the active thread if the priority is high enough).
If there is no thread ready to receive the pulse, it’s placed on
the channel’s send queue as per normal procedure - possibly boosting
the priority of threads in the process that created the channel.

The entries in the interrupt pending queue come from a preallocated
free pool (we can’t allocate them on demand since the kernel data
structures - including the heap, aren’t available to us when we
need them). The initial size of the free pool is 200 entries, but
that is grown if the system notices heavy use of the queue. If
you run out of entries, the infamous “Out of interrupt events”
message comes out and the event is dropped :frowning:.

Normally the interrupt comes in, it’s either delivered right away,
or put on the pending queue and then the kernel call is preempted
(assuming a high enough priority on the pending item), which will
quickly get to the point where the queue is drained. If we can’t
preempt the kernel call right away, we’ll shortly get to a point
where we can and then the queue will be drained. Only a high, continuous
load of interrupts (e.g. a buggy interrupt handler not clearing
the hardware condition) will cause us to be unable to drain the
queue and eventually run out of free entries for it.


Brian Stecher (bstecher@qnx.com) QNX Software Systems, Ltd.
phone: +1 (613) 591-0931 (voice) 175 Terence Matthews Cr.
+1 (613) 591-3579 (fax) Kanata, Ontario, Canada K2M 1W8

Brian Stecher wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote:

“Brian Stecher” <> bstecher@qnx.com> > wrote in message
news:b1e2f7$s9t$> 1@nntp.qnx.com> …

Nope. Event delivery & scheduling decisions caused by the event will
happen before code from any user level thread is executed - non-SMP.
In the SMP case things are a little more complicated, but the event
delivery and rescheduling will certainly take place before the next
clock tick.


What if a RR thread has its timeslice about to expire? To hold the above
statement true (and not postpone the scheduling) you have to be able to
complete delivery of ALL queued pulses with priority >= that thread within a
timeframe less than 1 TICK long. I am very curious, how that is done. From a
purely logical perspective, how do you maintain a queue of asynchronous
events without running the producer and consumer asynchronously (that is, at
potentially different pace)?


The queue CAN be indefinitely long, right? In fact it could be full of
pulses with even higher priority already before the pulse in question is
sent. And if you were delivering ALL eligible pulses before the next TICK,
that COULD potentially take indefinitely long time? Then there has to be a
bound on how many events can be delivered ‘per pass’. If there is, then
there got to be a potential latency, at least in a worst case. For some
people I know the word ‘queue’ is just another word for ‘latency’. I don’t
like poking in the dark like this, but docs are rather scarce on the
subject.


The docs are scarce because this is an implementation detail that can
change from release to release. Ya gotta leave a little mystery in
life or where’s the fun > :slight_smile:> .

Hehe, I am not afraid of that. Working with QNX we must have all the fun
in the world :stuck_out_tongue:

Anyway, Right now, if an interrupt that the kernel needs to be delivered
and it recognizes that the kernel data structures may not be in a consistent
state, it places the event on an “interrupt pending queue” (if the
kernel state is known to be consistent the event is delivered right
away). At various known “good” points during kernel execution the state
of that queue is checked (e.g. just before we transfer control back
to a user thread) and, if need be, the queue is drained and the
events delivered. That’s what I meant in my original message. In
the case of a pulse, if there is a thread received blocked on the
channel, the pulse will immediately be delivered and the thread
readied (and made the active thread if the priority is high enough).
If there is no thread ready to receive the pulse, it’s placed on
the channel’s send queue as per normal procedure - possibly boosting
the priority of threads in the process that created the channel.

The entries in the interrupt pending queue come from a preallocated
free pool (we can’t allocate them on demand since the kernel data
structures - including the heap, aren’t available to us when we
need them). The initial size of the free pool is 200 entries, but
that is grown if the system notices heavy use of the queue. If
you run out of entries, the infamous “Out of interrupt events”
message comes out and the event is dropped > :frowning:> .

Normally the interrupt comes in, it’s either delivered right away,
or put on the pending queue and then the kernel call is preempted
(assuming a high enough priority on the pending item), which will
quickly get to the point where the queue is drained. If we can’t
preempt the kernel call right away, we’ll shortly get to a point
where we can and then the queue will be drained. Only a high, continuous
load of interrupts (e.g. a buggy interrupt handler not clearing
the hardware condition) will cause us to be unable to drain the
queue and eventually run out of free entries for it.

That in essense means you’re not running the producers & consumers
asynchronously. Instead, you’re absolutely favoring consumers in attempt
to keep the queue low, since QNX can only work right as long as that is
true. As load becomes heavier you’re gonna be throttling the producers
by starving them of CPU cycles. Which may not be a bad idea overall
since it helps to keep the balance, but it would not be a very realtime
behavior, would it?

It also does not scale well and leaves the kernel very vulnerable to
event flooding by unprivileged processes :frowning:

I suppose there are many cases when the above approach works best, but I
imagine there would also be cases when it does not. The point is, QNX is
ought to say more about design tradeoffs and ramifications imposed by
the implementation. The SysArch doc puts it all in such a way it sounds
like we’re getting free Dom Perignon and caviar delivered into the bed
by those japanese twins :wink:

Has anyone tried to run simulations or do any kind of quantifications
for the ‘shortly get to the …’ and other similar assumptions? Why 200
entries? When there can be not enough if it can grow? Why can’t it be a
tunable parameter? What is the sustainable and peak bandwidth of
interrupts/signals/pulses?

Personally, thank you very much for some light on the subject Brian :slight_smile:

Regards,
– igor

Igor Kovalenko <kovalenko@attbi.com> wrote:

That in essense means you’re not running the producers & consumers
asynchronously. Instead, you’re absolutely favoring consumers in attempt
to keep the queue low, since QNX can only work right as long as that is
true. As load becomes heavier you’re gonna be throttling the producers
by starving them of CPU cycles. Which may not be a bad idea overall
since it helps to keep the balance, but it would not be a very realtime
behavior, would it?

I don’t think I understand what you mean by “producers” and “consumers”.
To me, the producer is the piece of the hardware generating the interrupt
and the consumer is the code in the kernel that drains the interrupt
pending queue, delivering the events. You can’t get more asynchronous
than joe random piece of hardware toggling a pin. The only way we
can throttle the producer is by disabling interrupts or masking off
a particular level - stuff we try avoid to reduce latency. If
we didn’t have the intr pending queue, we’d have to ensure that interrupts
only occured when the kernel data structures were in a consistent
state => disabling interrupts for much longer periods of time.

It also does not scale well and leaves the kernel very vulnerable to
event flooding by unprivileged processes > :frowning:

How can an unprivledged process cause a hardware interrupt flood?

I suppose there are many cases when the above approach works best, but I
imagine there would also be cases when it does not. The point is, QNX is
ought to say more about design tradeoffs and ramifications imposed by
the implementation. The SysArch doc puts it all in such a way it sounds
like we’re getting free Dom Perignon and caviar delivered into the bed
by those japanese twins > :wink:

And I’ve got some swamp land in Florida for you :slight_smile:.

There’s always going to be trade offs, both in terms of implementation
and level of detail in the documentation. I would argue that most
people don’t care how we implemented this - what they want to know
are the latency numbers. How long to enter the ISR handler? How long
to schedule a thread?

Has anyone tried to run simulations or do any kind of quantifications
for the ‘shortly get to the …’ and other similar assumptions?

That’s a matter for our performance testing. If we don’t get to it
shortly, there’s a spike in the interrupt latency which we’ll want
to find and fix. One whole aspect of the kernel design is to be able
to quickly respond to events.

Why 200 entries?

I can’t tell you the why of that exact number - it was here before
me and I suspect will be there long after I’m gone :slight_smile:. The trade
off is between being able to handle a burst of high rate interrupts
and memory usage.

When there can be not enough if it can grow?

A high rate of interrupts will cause it grow - assuming that the CPU
isn’t completely overwhelmed from handling the interrupts.

Why can’t it be a tunable parameter?

If could be if someone needed it to be. We’ve never had anyone
need it to be.


Brian Stecher (bstecher@qnx.com) QNX Software Systems, Ltd.
phone: +1 (613) 591-0931 (voice) 175 Terence Matthews Cr.
+1 (613) 591-3579 (fax) Kanata, Ontario, Canada K2M 1W8

Brian Stecher <bstecher@qnx.com> wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote:
I suppose there are many cases when the above approach works best, but I
imagine there would also be cases when it does not. The point is, QNX is
ought to say more about design tradeoffs and ramifications imposed by
the implementation. The SysArch doc puts it all in such a way it sounds
like we’re getting free Dom Perignon and caviar delivered into the bed
by those japanese twins > :wink:

And I’ve got some swamp land in Florida for you > :slight_smile:> .

There’s always going to be trade offs, both in terms of implementation
and level of detail in the documentation. I would argue that most
people don’t care how we implemented this - what they want to know
are the latency numbers. How long to enter the ISR handler? How long
to schedule a thread?

I think the important issues is that there are ALSO tradeoffs for the
customer (us) to implement our software one way over another. If we
better understand what’s happeneing undernieth we can better judge how
to write our own code.

Why 200 entries?

Why can’t it be a tunable parameter?

If could be if someone needed it to be. We’ve never had anyone
need it to be.

It should bea tunable parameter. By the time the OS detects that it
may need to grow, it may already be too late. If I’m designing a
whatever that I know will be interrupt intensive, then I should be
able to handle that up front.


Bill Caroselli – Q-TPS Consulting
1-(626) 824-7983
qtps@earthlink.net

Brian Stecher wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote:

That in essense means you’re not running the producers & consumers
asynchronously. Instead, you’re absolutely favoring consumers in attempt
to keep the queue low, since QNX can only work right as long as that is
true. As load becomes heavier you’re gonna be throttling the producers
by starving them of CPU cycles. Which may not be a bad idea overall
since it helps to keep the balance, but it would not be a very realtime
behavior, would it?


I don’t think I understand what you mean by “producers” and “consumers”.
To me, the producer is the piece of the hardware generating the interrupt
and the consumer is the code in the kernel that drains the interrupt
pending queue, delivering the events. You can’t get more asynchronous
than joe random piece of hardware toggling a pin.

By producers I mean anything that produces events. Could be hardware
generating interrupts, could be software sending pulses or queued
signals. And the consumer is not the kernel (to me, anyway). It is the
threads those events destined to. The kernel is a queue manager.

The only way we
can throttle the producer is by disabling interrupts or masking off
a particular level - stuff we try avoid to reduce latency. If
we didn’t have the intr pending queue, we’d have to ensure that interrupts
only occured when the kernel data structures were in a consistent
state => disabling interrupts for much longer periods of time.

That solves the problem for hardware producers, but not for software.
Since you’re not returning to user mode until the queue is drained, the
longer is queue the more time kernel will be spending doing that. If the
queue is nearly full and is being filled at the same pace as drained,
you’re not gonna be doing much else other than draining. Which means
you’re indirectly throttling the [software] producers by consuming most
of the CPU cycles in the kernel. That is the price you pay for not
scheduling the draining independently of filling (which of course would
have its own price, called LATENCY).

It also does not scale well and leaves the kernel very vulnerable to
event flooding by unprivileged processes > :frowning:


How can an unprivledged process cause a hardware interrupt flood?

Pulse/signal flood should be equally good, no? One thread creates a
channel and blocks on something else, other thread spins in
MsgSendPulse() …

I suppose there are many cases when the above approach works best, but I
imagine there would also be cases when it does not. The point is, QNX is
ought to say more about design tradeoffs and ramifications imposed by
the implementation. The SysArch doc puts it all in such a way it sounds
like we’re getting free Dom Perignon and caviar delivered into the bed
by those japanese twins > :wink:


And I’ve got some swamp land in Florida for you > :slight_smile:> .

So I better become a crocodile hunter :slight_smile:

[…]

Why can’t it be a tunable parameter?


If could be if someone needed it to be. We’ve never had anyone
need it to be.

I see the last question [about the peak/sustained bandwidth] was
inconvinient enough to skip it. I am not complaining at all :wink:

Cheers,
– igor

Igor Kovalenko <kovalenko@attbi.com> wrote:

By producers I mean anything that produces events. Could be hardware
generating interrupts, could be software sending pulses or queued
signals. And the consumer is not the kernel (to me, anyway). It is the
threads those events destined to. The kernel is a queue manager.

Ah, that’s going a bit further afield. The original question was
about us delaying interrupt event delivery (and associated scheduling
decisions) to the next clock tick.

That solves the problem for hardware producers, but not for software.
Since you’re not returning to user mode until the queue is drained, the
longer is queue the more time kernel will be spending doing that. If the
queue is nearly full and is being filled at the same pace as drained,
you’re not gonna be doing much else other than draining. Which means
you’re indirectly throttling the [software] producers by consuming most
of the CPU cycles in the kernel. That is the price you pay for not
scheduling the draining independently of filling (which of course would
have its own price, called LATENCY).

I would argue at that point that you haven’t got a powerful enough
CPU to handle the job that you’re asking it to do. True, if our code
wasn’t there, the CPU might be able to handle the interrupt load but
that’s part of the cost you pay for the benefits that an OS provides
you. We’re extremely aware that every instruction the kernel executes
is viewed as overhead by our customers and work to minimize them.

Of course, the most minimal overhead is not to have any code at all,
but I think the feature list of such an OS would be somewhat small
as well :slight_smile:.

Pulse/signal flood should be equally good, no? One thread creates a
channel and blocks on something else, other thread spins in
MsgSendPulse() …

That’s a seperate piece of code, which I didn’t think we were
talking about. We have got a problem there if that (different) queue
gets too long and we’re working on the best way of fixing it.

I see the last question [about the peak/sustained bandwidth] was
inconvinient enough to skip it. I am not complaining at all > :wink:

I thought long and hard about to that question. To a certain extent
it was like asking how high is up? You can’t answer without more
information like CPU speed, memory speed, and a public benchmark.

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.


Brian Stecher (bstecher@qnx.com) QNX Software Systems, Ltd.
phone: +1 (613) 591-0931 (voice) 175 Terence Matthews Cr.
+1 (613) 591-3579 (fax) Kanata, Ontario, Canada K2M 1W8

“Brian Stecher” <bstecher@qnx.com> wrote in message
news:b1re8e$n97$1@nntp.qnx.com

Pulse/signal flood should be equally good, no? One thread creates a
channel and blocks on something else, other thread spins in
MsgSendPulse() …

That’s a seperate piece of code, which I didn’t think we were
talking about. We have got a problem there if that (different) queue
gets too long and we’re working on the best way of fixing it.

Good. Let me know when you find a way :slight_smile:

I see the last question [about the peak/sustained bandwidth] was
inconvinient enough to skip it. I am not complaining at all > :wink:

I thought long and hard about to that question. To a certain extent
it was like asking how high is up? You can’t answer without more
information like CPU speed, memory speed, and a public benchmark.

Of course. Nothing stops you from giving a quote for an example
configuration though.

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

Yes, I have that report. However they did all interrupt processing (which I
understand amounted essentially to clearing the interrupt) directly in the
ISR, afair. If they used InterruptAttachEvent() the number might have been
different.

We are indeed thinking about this from somewhat different angles. You’re
talking about interrupts mostly, while I am talking about queued events in
general. Even if DS did use InterruptAttachEvent(), it still would not be
the answer to my exact question because the number would reflect overhead of
hardware interrupts. What I am asking is the pure bandwidth number for
pulses (or queued signals).

Regards,
– igor

Brian Stecher wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote:

[ clip …]

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

Does it mean that two different interrupts can’t be handled if they are
generated within a time difference less than 9 microseconds???

There are problems with the interrupt handling if both CAN controllers
of the MGT5100 PPC CPU are connected to the same CAN network and
accepting the same frame (at the same time), which leads to the
generation of different interrupts with different hardware priorities.
( Time difference between interrupts in the range of nanoseconds )

In such a case the whole machine locks up!!

The processing of ISRs attached by InterruptAttach() works 100% by only
ONCE :slight_smile: After the first action the performance goes to ZERO … that
means the ISR will never be processed even if the hardware signals
interrupt conditions.

Armin

Armin Steinhoff <a-steinhoff@web.de> wrote in message
news:b25cph$lml$1@inn.qnx.com

Brian Stecher wrote:
Igor Kovalenko <> kovalenko@attbi.com> > wrote:

[ clip …]

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

Does it mean that two different interrupts can’t be handled if they are
generated within a time difference less than 9 microseconds???

I am not a kernel developer, but I see you conviniently ONLY quoted
part of Brian’s post, and left out the important one. Let’s quote Brian’s
post again:

Brian Stecher <> bstecher@qnx.com> > wrote in message
news:b1re8e$n97$> 1@nntp.qnx.com> …
Igor Kovalenko <> kovalenko@attbi.com> > wrote:

I see the last question [about the peak/sustained bandwidth] was
inconvinient enough to skip it. I am not complaining at all > :wink:

I thought long and hard about to that question. To a certain extent
it was like asking how high is up? You can’t answer without more
information like CPU speed, memory speed, and a public benchmark.

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

This said,

a) it is depends on "CPU speed, memory speed, and a public benchmark. and
b) on a “200MHz Pentium”, there is this number “1interrupt every 9
microsecnds”.

So you should be able draw you own conculsion of what happened if
“two different interrupts are generated within a time difference less than
9 microseconds”. “Can’t handle” is NOT the right answer.

Igor asked for a number, Brian qouted one. The “9 microseconds” obveriously
have nothing to do with your sample below.

-xtang

There are problems with the interrupt handling if both CAN controllers
of the MGT5100 PPC CPU are connected to the same CAN network and
accepting the same frame (at the same time), which leads to the
generation of different interrupts with different hardware priorities.
( Time difference between interrupts in the range of nanoseconds )

In such a case the whole machine locks up!!

The processing of ISRs attached by InterruptAttach() works 100% by only
ONCE > :slight_smile: > After the first action the performance goes to ZERO … that
means the ISR will never be processed even if the hardware signals
interrupt conditions.

Armin

Xiaodan Tang wrote:

Armin Steinhoff <> a-steinhoff@web.de> > wrote in message
news:b25cph$lml$> 1@inn.qnx.com> …

Brian Stecher wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote:


[ clip …]

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

Does it mean that two different interrupts can’t be handled if they are
generated within a time difference less than 9 microseconds???


I am not a kernel developer, but I see you conviniently ONLY quoted
part of Brian’s post, and left out the important one. Let’s quote Brian’s
post again:


Brian Stecher <> bstecher@qnx.com> > wrote in message

news:b1re8e$n97$> 1@nntp.qnx.com> …

Igor Kovalenko <> kovalenko@attbi.com> > wrote:

I see the last question [about the peak/sustained bandwidth] was
inconvinient enough to skip it. I am not complaining at all > :wink:

I thought long and hard about to that question. To a certain extent
it was like asking how high is up? You can’t answer without more
information like CPU speed, memory speed, and a public benchmark.

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.


This said,

a) it is depends on "CPU speed, memory speed, and a public benchmark. and
b) on a “200MHz Pentium”, there is this number “1interrupt every 9
microsecnds”.

So you should be able draw you own conculsion of what happened if
“two different interrupts are generated within a time difference less than
9 microseconds”. “Can’t handle” is NOT the right answer.

Yes … I have a clear conclusion about what SHOULD happen!

Armin




Igor asked for a number, Brian qouted one. The “9 microseconds” obveriously
have nothing to do with your sample below.

-xtang


There are problems with the interrupt handling if both CAN controllers
of the MGT5100 PPC CPU are connected to the same CAN network and
accepting the same frame (at the same time), which leads to the
generation of different interrupts with different hardware priorities.
( Time difference between interrupts in the range of nanoseconds )

In such a case the whole machine locks up!!

The processing of ISRs attached by InterruptAttach() works 100% by only
ONCE > :slight_smile: > After the first action the performance goes to ZERO … that
means the ISR will never be processed even if the hardware signals
interrupt conditions.

Armin


\

“Armin Steinhoff” <a-steinhoff@web.de> wrote in message
news:b25cph$lml$1@inn.qnx.com

Brian Stecher wrote:
Igor Kovalenko <> kovalenko@attbi.com> > wrote:

[ clip …]

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

Does it mean that two different interrupts can’t be handled if they are
generated within a time difference less than 9 microseconds???

I could leave the answer to Brian… but here is my take on this. I don’t
think what he said means exactly what you’re implying. There’s a difference
between peak rate and sustained rate, which exists due to the queue (of
pending interrupts). So the answer to your question is ‘yes and no’. No, if
you’re talking about 2 interrupts within 9ms now and then some period of
lower load. Yes, if you’re talking about 2 interrupts within 9ms sustained
all the time.

Generally, the answer is no as long as either (1) CPU is fast enough to
handle that sustained rate or (2) the queue has enough free slots to
accomodate a temporary spike.

There are problems with the interrupt handling if both CAN controllers
of the MGT5100 PPC CPU are connected to the same CAN network and
accepting the same frame (at the same time), which leads to the
generation of different interrupts with different hardware priorities.
( Time difference between interrupts in the range of nanoseconds )

In such a case the whole machine locks up!!

The processing of ISRs attached by InterruptAttach() works 100% by only
ONCE > :slight_smile: > After the first action the performance goes to ZERO … that
means the ISR will never be processed even if the hardware signals
interrupt conditions.

Do you have a debug console attached/configured? If you can hook to serial
port and redirect kernel output (-K and -D options of startup, I think) you
might be able to see what’s happening…

– igor

Igor Kovalenko <kovalenko@attbi.com> wrote:

Brian Stecher <> bstecher@qnx.com> > wrote:
One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

Yes, I have that report. However they did all interrupt processing (which I
understand amounted essentially to clearing the interrupt) directly in the
ISR, afair. If they used InterruptAttachEvent() the number might have been
different.

There was another test that did reschedule a thread. The max latency
they saw was 7.7uS.

We are indeed thinking about this from somewhat different angles. You’re
talking about interrupts mostly, while I am talking about queued events in
general. Even if DS did use InterruptAttachEvent(), it still would not be
the answer to my exact question because the number would reflect overhead of
hardware interrupts. What I am asking is the pure bandwidth number for
pulses (or queued signals).

See test below. On a 350MHz PII, it did 172146 signals in 1 second,
or 5.8uS per signal. Converting it to do pulses is left as an exercise
for the reader :slight_smile:.


#include <signal.h>
#include <setjmp.h>
#include <sys/siginfo.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>

#define TIME 1

sigjmp_buf jmpbuf;
unsigned volatile count;

static void
sig_handler(int sig) {
++count;
}

static void
alrm_handler(int sig) {
siglongjmp(jmpbuf, 1);
}

int
main() {
int pid = getpid();

signal(SIGUSR1, sig_handler);
signal(SIGALRM, alrm_handler);

if(sigsetjmp(jmpbuf, 0) == 0) {
alarm(TIME);
for(;:wink: {
kill(pid, SIGUSR1);
}
}
printf(“Did %u signals in %d second(s).\n”, count, TIME);
return 0;
}

\

Brian Stecher (bstecher@qnx.com) QNX Software Systems, Ltd.
phone: +1 (613) 591-0931 (voice) 175 Terence Matthews Cr.
+1 (613) 591-3579 (fax) Kanata, Ontario, Canada K2M 1W8

Armin Steinhoff <a-steinhoff@web.de> wrote:

Brian Stecher wrote:
Igor Kovalenko <> kovalenko@attbi.com> > wrote:

[ clip …]

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.

Does it mean that two different interrupts can’t be handled if they are
generated within a time difference less than 9 microseconds???

No, They issued 10**9 interrupts at various rates and deemed the system
capable of handling them if none of the interrupts were dropped at a
particular rate. At 8uS, 12 interrupts were lost.

I’m not going to type in the whole report - just download it if you’re
interested, but there were other tests where they issued simultaneous
and near simultaneous interrupts on two different levels and found no
issues in handling them.

There are problems with the interrupt handling if both CAN controllers
of the MGT5100 PPC CPU are connected to the same CAN network and
accepting the same frame (at the same time), which leads to the
generation of different interrupts with different hardware priorities.
( Time difference between interrupts in the range of nanoseconds )

In such a case the whole machine locks up!!

From the DS report, you can see that the generic kernel code handles
nested interrupts. I’d be very surprised if PPC specific code in the
kernel for handling interrupts had a problem with it, so there must
be something in the board specific code that’s messing up. I can’t make
any comment on what may be happening there - I don’t know the MGT5100
hardware, I don’t know CAN, and I certainly don’t know what your software
is doing :slight_smile:.


Brian Stecher (bstecher@qnx.com) QNX Software Systems, Ltd.
phone: +1 (613) 591-0931 (voice) 175 Terence Matthews Cr.
+1 (613) 591-3579 (fax) Kanata, Ontario, Canada K2M 1W8

Brian Stecher wrote:

We are indeed thinking about this from somewhat different angles. You’re
talking about interrupts mostly, while I am talking about queued events in
general. Even if DS did use InterruptAttachEvent(), it still would not be
the answer to my exact question because the number would reflect overhead of
hardware interrupts. What I am asking is the pure bandwidth number for
pulses (or queued signals).


See test below. On a 350MHz PII, it did 172146 signals in 1 second,
or 5.8uS per signal. Converting it to do pulses is left as an exercise
for the reader > :slight_smile:> .

Sure. And converting it to do QUEUED signals is left as an exercise for
the writer :slight_smile:

#include <signal.h
#include <setjmp.h
#include <sys/siginfo.h
#include <stdio.h
#include <errno.h
#include <unistd.h

#define TIME 1

sigjmp_buf jmpbuf;
unsigned volatile count;

static void
sig_handler(int sig) {
++count;
}

static void
alrm_handler(int sig) {
siglongjmp(jmpbuf, 1);
}

int
main() {
int pid = getpid();

signal(SIGUSR1, sig_handler);
signal(SIGALRM, alrm_handler);

if(sigsetjmp(jmpbuf, 0) == 0) {
alarm(TIME);
for(;:wink: {
kill(pid, SIGUSR1);
}
}
printf(“Did %u signals in %d second(s).\n”, count, TIME);
return 0;
}

Armin Steinhoff <a-steinhoff@web.de> wrote:

Brian Stecher wrote:
Igor Kovalenko <> kovalenko@attbi.com> > wrote:

[ clip …]

generated within a time difference less than 9 microseconds???

There are problems with the interrupt handling if both CAN controllers
of the MGT5100 PPC CPU are connected to the same CAN network and
accepting the same frame (at the same time), which leads to the
generation of different interrupts with different hardware priorities.
( Time difference between interrupts in the range of nanoseconds )

In such a case the whole machine locks up!!

Armin,

This particular issue is really not related to this thread. We’ve
been made aware of this problem this past Friday. We are working
with Motorola SPS to get the source to the BSP to examine the problem.

It’s more than likely a BSP issue rather than an OS issue.


John

Brian Stecher wrote:

Armin Steinhoff <> a-steinhoff@web.de> > wrote:

Brian Stecher wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote:


[ clip …]

One thing I did think of as I was coming into work this morning
was the Dedicated System Report on 6.2 that’s available on our
website. They did a test on a 200MHz Pentium that showed Neutrino
could handle a sustained rate of 1 interrupt every 9 microseconds
without missing anything.


Does it mean that two different interrupts can’t be handled if they are
generated within a time difference less than 9 microseconds???


No, They issued 10**9 interrupts at various rates and deemed the system
capable of handling them if none of the interrupts were dropped at a
particular rate. At 8uS, 12 interrupts were lost.

I’m not going to type in the whole report - just download it if you’re
interested, but there were other tests where they issued simultaneous
and near simultaneous interrupts on two different levels and found no
issues in handling them.


There are problems with the interrupt handling if both CAN controllers
of the MGT5100 PPC CPU are connected to the same CAN network and
accepting the same frame (at the same time), which leads to the
generation of different interrupts with different hardware priorities.
( Time difference between interrupts in the range of nanoseconds )


In such a case the whole machine locks up!!


From the DS report, you can see that the generic kernel code handles
nested interrupts. I’d be very surprised if PPC specific code in the
kernel for handling interrupts had a problem with it, so there must
be something in the board specific code that’s messing up.

Nice … tell it to your automotive customers :slight_smile:

I can’t make any comment on what may be happening there - I don’t know the MGT5100
hardware,

The MGT5100 is plain silicon … a real SoC.

I don’t know CAN, and I certainly don’t know what your software
is doing > :slight_smile:> .

It works well … you could use it to test your interrupt subsystem.

There are concurrency problems between any arbitrary interrupt driven
resource manager.

Armin

Hi Everyone,

I’ve finally managed to get back on this subject. Thanks to everyone for
their contributions, it’s been an enlightening few weeks!

Finally I have measured 6.2.1’s interrupt latency! My method, just so
that everyone is clear is the following:

1, Receive interrupt from hardware
2, Do a little processing
3, Record ClockCycles in a global variable
4, Send the pulse to the worker thread
5, Record ClockCycles
6, Calculate the time taken to receive the pulse

Now comes the time where I must eat my hat! I’ve recorded the following
results:

AMD 1500MHz Pulse → Worker execution (no load) 3-4us
NEC 300MHz Geode 20-25uS

These are in the ball park for what I’d expect from QNX. The issue turns
out to be a problem with the hardware and it’s inability to operate in
the mode that I require?

FYI, if you have a polling algorithm and your hardware takes between 200
and 300 us to read from/write to, this can lead to a packet to packet
latency in excess of 1 ms!

Thanks anyway for the comments/help.


Dave

Dave, we’ve just implemented an interrupt handler on a Geode (Advantech
PCM-3350) and are seeing sporadic interrupt latency of up to 300 us (typical
is around the 20 that you reported). Do you see this? I’ve attributed it to
the Geode SMM, but can’t prove it yet. (This is with 6.2.0)

What is the hardware issue you’ve got?

Thanks,
Marty Doane
Siemens Dematic


“Dave Edwards” <Dave.edwards@abicom-international.com> wrote in message
news:3E5A391B.1080502@abicom-international.com

Hi Everyone,

I’ve finally managed to get back on this subject. Thanks to everyone for
their contributions, it’s been an enlightening few weeks!

Finally I have measured 6.2.1’s interrupt latency! My method, just so
that everyone is clear is the following:

1, Receive interrupt from hardware
2, Do a little processing
3, Record ClockCycles in a global variable
4, Send the pulse to the worker thread
5, Record ClockCycles
6, Calculate the time taken to receive the pulse

Now comes the time where I must eat my hat! I’ve recorded the following
results:

AMD 1500MHz Pulse → Worker execution (no load) 3-4us
NEC 300MHz Geode 20-25uS

These are in the ball park for what I’d expect from QNX. The issue turns
out to be a problem with the hardware and it’s inability to operate in
the mode that I require?

FYI, if you have a polling algorithm and your hardware takes between 200
and 300 us to read from/write to, this can lead to a packet to packet
latency in excess of 1 ms!

Thanks anyway for the comments/help.


Dave

Hi Marty,

I’m also seeing longer latencies on the NEC Board (Wafer5825). These are
occasionally in the region of 300uS.

My AMD system has occasional long latencies, but I think that this is
probably another interrupt event (due to the way that I’m measuring the
times)

One thing that I’ve noticed with the Geode processor is that the VGA
bios appears to be implemented in the SMM, this can mean that the act of
printing to the screen will destroy any performance that you may have.

As to my hardwar issue, I have a NIC that should really implement Bus
Mastering DMA but currently does not. This is causing significant delays
in the netowrk driver, due to the time taken to do a PIO.

Hope this helps

Dave


Marty Doane wrote:

Dave, we’ve just implemented an interrupt handler on a Geode (Advantech
PCM-3350) and are seeing sporadic interrupt latency of up to 300 us (typical
is around the 20 that you reported). Do you see this? I’ve attributed it to
the Geode SMM, but can’t prove it yet. (This is with 6.2.0)

What is the hardware issue you’ve got?

Thanks,
Marty Doane
Siemens Dematic


“Dave Edwards” <> Dave.edwards@abicom-international.com> > wrote in message
news:> 3E5A391B.1080502@abicom-international.com> …

Hi Everyone,

I’ve finally managed to get back on this subject. Thanks to everyone for
their contributions, it’s been an enlightening few weeks!

Finally I have measured 6.2.1’s interrupt latency! My method, just so
that everyone is clear is the following:

1, Receive interrupt from hardware
2, Do a little processing
3, Record ClockCycles in a global variable
4, Send the pulse to the worker thread
5, Record ClockCycles
6, Calculate the time taken to receive the pulse

Now comes the time where I must eat my hat! I’ve recorded the following
results:

AMD 1500MHz Pulse → Worker execution (no load) 3-4us
NEC 300MHz Geode 20-25uS

These are in the ball park for what I’d expect from QNX. The issue turns
out to be a problem with the hardware and it’s inability to operate in
the mode that I require?

FYI, if you have a polling algorithm and your hardware takes between 200
and 300 us to read from/write to, this can lead to a packet to packet
latency in excess of 1 ms!

Thanks anyway for the comments/help.


Dave


\

Thanks Dave.

Marty Doane
Siemens Dematic

“Dave Edwards” <Dave.edwards@abicom-international.com> wrote in message
news:3E5B4CF3.6060909@abicom-international.com

Hi Marty,

I’m also seeing longer latencies on the NEC Board (Wafer5825). These are
occasionally in the region of 300uS.

My AMD system has occasional long latencies, but I think that this is
probably another interrupt event (due to the way that I’m measuring the
times)

One thing that I’ve noticed with the Geode processor is that the VGA
bios appears to be implemented in the SMM, this can mean that the act of
printing to the screen will destroy any performance that you may have.

As to my hardwar issue, I have a NIC that should really implement Bus
Mastering DMA but currently does not. This is causing significant delays
in the netowrk driver, due to the time taken to do a PIO.

Hope this helps

Dave


Marty Doane wrote:
Dave, we’ve just implemented an interrupt handler on a Geode (Advantech
PCM-3350) and are seeing sporadic interrupt latency of up to 300 us
(typical
is around the 20 that you reported). Do you see this? I’ve attributed it
to
the Geode SMM, but can’t prove it yet. (This is with 6.2.0)

What is the hardware issue you’ve got?

Thanks,
Marty Doane
Siemens Dematic


“Dave Edwards” <> Dave.edwards@abicom-international.com> > wrote in message
news:> 3E5A391B.1080502@abicom-international.com> …

Hi Everyone,

I’ve finally managed to get back on this subject. Thanks to everyone for
their contributions, it’s been an enlightening few weeks!

Finally I have measured 6.2.1’s interrupt latency! My method, just so
that everyone is clear is the following:

1, Receive interrupt from hardware
2, Do a little processing
3, Record ClockCycles in a global variable
4, Send the pulse to the worker thread
5, Record ClockCycles
6, Calculate the time taken to receive the pulse

Now comes the time where I must eat my hat! I’ve recorded the following
results:

AMD 1500MHz Pulse → Worker execution (no load) 3-4us
NEC 300MHz Geode 20-25uS

These are in the ball park for what I’d expect from QNX. The issue turns
out to be a problem with the hardware and it’s inability to operate in
the mode that I require?

FYI, if you have a polling algorithm and your hardware takes between 200
and 300 us to read from/write to, this can lead to a packet to packet
latency in excess of 1 ms!

Thanks anyway for the comments/help.


Dave



\