ISR Misses Interrupts

David Kuechenmeister wrote:

My conclusion is that the kernel is somehow too busy to service the hardware
interrupt and so my timing suffers.

Hmmm, since the kernel is pre-emptable, and (presumably) your handler is
at a higher priority than the kernel, I can’t see how the kernel could
be involved. The kernel should only make a scheduling decision upon exit
from the handler, and, if your using InterruptAttachEvent, and if your
event is a pulse, and if the pulse priority is the highest priority,
then the scheduler will context switch (and the context switch is - as
Chris states - very small) directly into your handler.

In order to determine if scheduling latency is playing a role, you
should replace InterruptAttachEvent with InterruptAttach of a handler
that does nothing more than toggle an I/O line high, and return the same
event that you would have registered with InterruptAttachEvent. Then as
the first instruction after the receive in your handler, toggle the I/O
line low. When the problem situation occurs compare the scheduling
latency, to that before the situation occured (obtained by hooking a
scope to the I/O line), if it hasn’t changed then the kernel is not
involved.

As a data point, a project I worked on in a previous life had a 5Khz
control loop - not simple data acquisition (on 16 channels
simultaneously) under QNX4 on 50Mhz 486, so a 166Mhz doing only a single
channel at only 600Hz should be using only a microscopically small
fraction of available CPU.

My question is what causes the kernel to
get in this state. If the processor was overloaded, wouldn’t the problem
show up immediately? When I use “spin”, thanks Igor, the only difference in
cpu consumption I can see is that my interrupt handler takes a little more
cpu time when it is missing interrupts. All I do is send a message from the
handler to schedule my periodic process. Again, why does this work well for
half a day, then go south?

You have checked the priority of the pulse right ?

Rennie

“Chris McKillop” <cdm@qnx.com> wrote in message
news:asjfo1$il6$1@nntp.qnx.com

[SNIP]
[additional deletions]
I mocked up a little test case, > http://qnx.wox.org/qnx/sources/fakeint.c> .
This is running on my P166 right now and I am using about 0.15% of the
CPU.
The test grabs the timer interrupt (0), which fires every ~1ms. It then
signals another process to write data into a consumer process (/dev/null).

So, I am not sure you can easily blame context-switch overhead.

Were you using spin to measure the cpu usage? My interrupt handler,

analogous to your interrupt_child, is using 2 - 7 percent.

Maybe it’s not context switching, but something odd is certainly going on. I
can do what I want with my interrupt handler, too… for a while. I set up a
test that is very similar to yours, I used InterruptAttachEvent, instead of
InterruptAttach to remove the context switching in and out of the ISR. I can
run it forever, certainly it ran reliably for the 5 days we were off over
Thanksgiving.

When I run the rest of my application, in addition to the interrupt handler,
the whole thing runs well, too. But it seems to be limited to somewhere
between 10 and 24 hours. I can start it in the morning when I come to work,
it is still running properly at the end of the day. Invariably the timing of
my control loops is off the next morning, though. Using a scope and logic
analyzer, I can see that the FPGA generates the pulses properly, but when I
write to a D/O pin I can see that the handler doesn’t run on every
interrupt.

My conclusion is that the kernel is somehow too busy to service the hardware
interrupt and so my timing suffers. My question is what causes the kernel to
get in this state. If the processor was overloaded, wouldn’t the problem
show up immediately? When I use “spin”, thanks Igor, the only difference in
cpu consumption I can see is that my interrupt handler takes a little more
cpu time when it is missing interrupts. All I do is send a message from the
handler to schedule my periodic process. Again, why does this work well for
half a day, then go south?

I’d like to avoid buying a bigger processor, but I can’t find any
programmatic reason that would cause such erratic behavior. If you have some
suggestions, I’d welcome them.

Sincerely,
David Kuechenmeister

Thanks for your interest in this problem. Let me back up a little, as I have
already done some of the things that you suggest.

I was initially using the implementation:

struct sigevent timer_isr_event;
SIGEV_INTR_INIT( &timer_isr_event );
InterruptAttach(Timer_IRQ,timer_isr,NULL,0,0)
setprio(0,sched_get_priority_max(SCHED_FIFO) - PROC_TIMER_PRI), where
PROC_TIMER_PRI is 10 and was the highest application priority,

to connect to an ISR that only cleared an interrupt. Later, I added a call
to toggle a bit on an IO line. I could detect the interrupt pulse and I
displayed it on a logic analyzer along with the toggled bit. The logic
analyzer was set to trigger on “long” pulses from the ISR, i.e. the toggled
bit wasn’t toggled when it should have been. Initially, the transitions of
the toggled bit followed the interrupt pulse. After some large amount of
time, usually overnight, the pulse generated by the toggled bit would span 2
or 3 interrupts in every 10 or 15 interrupts.

I understand, perhaps incorrectly, that the ISR isn’t scheduled, the
interrupt just needs to be detected. It’s the returned event from the ISR
that makes the handler ready to run. I think hardware interrupts are the
lowest priority on a x86 processor, so if the kernel was busy and had
disabled interrupts, I would never see the hardware interrupt that arrived
during that busy time, would I?

I didn’t look at the scheduling latency because I have a FPGA that handles
the I/O at 600 Hz. The fine time differences would be lost because all the
data is output at the same time during a 600 Hz frame. I might look at
writes on the address bus, though.

I tried the InterruptAttachEvent(Timer_IRQ,&timer_isr_event,0) call just to
try a different route to the handler. I figured if the context switches to
the ISR were removed, the timing might be changed enough to see some change
in the overall behavior. It didn’t.

The problem seems to still be missed interrupts, rather than handler
scheduling. I say this, because when the InterruptAttachEvent as defined
above, isn’t the kernel is solely responsible for detecting the interrupt
and scheduling the handler? That is, doesn’t the kernel perform the
additional function of the ISR that is used with the InterruptAttach() call?

Incidentally, there is quite a bit more processing taking place on this
board. The interrupt handler is just there to kick off a data collection
process. The data is processed pretty extensively after that. Spin puts the
cpu average load at 80 to 90 percent. I guess if the spin contribution to
that figure is removed, the cpu is at about 75 to 85 percent.

Thanks again for your interest and any suggestions.

Sincerely,
David Kuechenmeister


“Rennie Allen” <rallen@csical.com> wrote in message
news:3DEDA95A.4050604@csical.com

David Kuechenmeister wrote:

My conclusion is that the kernel is somehow too busy to service the
hardware
interrupt and so my timing suffers.

Hmmm, since the kernel is pre-emptable, and (presumably) your handler is
at a higher priority than the kernel, I can’t see how the kernel could
be involved. The kernel should only make a scheduling decision upon exit
from the handler, and, if your using InterruptAttachEvent, and if your
event is a pulse, and if the pulse priority is the highest priority,
then the scheduler will context switch (and the context switch is - as
Chris states - very small) directly into your handler.


In order to determine if scheduling latency is playing a role, you
should replace InterruptAttachEvent with InterruptAttach of a handler
that does nothing more than toggle an I/O line high, and return the same
event that you would have registered with InterruptAttachEvent. Then as
the first instruction after the receive in your handler, toggle the I/O
line low. When the problem situation occurs compare the scheduling
latency, to that before the situation occured (obtained by hooking a
scope to the I/O line), if it hasn’t changed then the kernel is not
involved.

As a data point, a project I worked on in a previous life had a 5Khz
control loop - not simple data acquisition (on 16 channels
simultaneously) under QNX4 on 50Mhz 486, so a 166Mhz doing only a single
channel at only 600Hz should be using only a microscopically small
fraction of available CPU.

My question is what causes the kernel to
get in this state. If the processor was overloaded, wouldn’t the problem
show up immediately? When I use “spin”, thanks Igor, the only difference
in
cpu consumption I can see is that my interrupt handler takes a little
more
cpu time when it is missing interrupts. All I do is send a message from
the
handler to schedule my periodic process. Again, why does this work well
for
half a day, then go south?

You have checked the priority of the pulse right ?

Rennie

hey David…

Here are some things I have noticed:

o You should be passing in _NTO_INTR_FLAGS_TRK_MSK to InterruptAttachEvent().

o The call to InterruptDetatach() should be passing in the iid (from the
Attach) and not the interrupt number.

o Is Timer_IRQ the system timer or some other bit of hardware? If it is
some other bit of hardware (which it appears to be) where is your interrupt
ack? I see it in you timer_isr() code but not in your thread based handler.
You will want to do the out16(hDeviceHandle+IRQ_CLR, 0x0001) before you
unmask the interrupt.

o Your problems with timer_isr() come from your referencing of global
data directly. You are ment to pass everything in via the “void *arg”
pointer. Normally you would setup a structure with the bits you need
and pass that (and it’s size) into InterruptAttach() which would then
get passed into your handler. In your case this would be something
like:

struct myarea {
struct sigevent event;
int hDeviceHandle;
}


Based on your comments I think you are a little confused about the
difference between InterruptAttach() and InterruptAttachEvent(). In
the end, you have to do the same work in both cases. It turns out,
however, that in most cases all one needs is to have the sigevent
delivered on interrupt and the rest of the work can be done at the
thread level. So, for ease of use, InterruptAttachEvent() exists to
mask the interrupt and deliver the sigevent specified (and nothing more).

The only reason you would use InterruptAttach() over the Event() version
is if you have timing sensitive hardware (ie: you have to hit the hardware
within X amount of time, you can’t wait for a thread to run) OR if you can
avoid having a thread run on every interrupt. For example, on some devices
not every interrupt requires you to do anything besides ack the interrupt.
In these cases you don’t need to return the sigevent structure and can avoid
having a context switch to the handler thread. If you start having really
high interrupt rates (50Khz and beyond), being able to control context
switches in this manner becomes critical.

Hope this helps a little…

chris



David Kuechenmeister <david.kuechenmeister@viasat.com> wrote:

Thanks for your interest in this problem. Let me back up a little, as I have
already done some of the things that you suggest.

I was initially using the implementation:

struct sigevent timer_isr_event;
SIGEV_INTR_INIT( &timer_isr_event );
InterruptAttach(Timer_IRQ,timer_isr,NULL,0,0)
setprio(0,sched_get_priority_max(SCHED_FIFO) - PROC_TIMER_PRI), where
PROC_TIMER_PRI is 10 and was the highest application priority,

to connect to an ISR that only cleared an interrupt. Later, I added a call
to toggle a bit on an IO line. I could detect the interrupt pulse and I
displayed it on a logic analyzer along with the toggled bit. The logic
analyzer was set to trigger on “long” pulses from the ISR, i.e. the toggled
bit wasn’t toggled when it should have been. Initially, the transitions of
the toggled bit followed the interrupt pulse. After some large amount of
time, usually overnight, the pulse generated by the toggled bit would span 2
or 3 interrupts in every 10 or 15 interrupts.

I understand, perhaps incorrectly, that the ISR isn’t scheduled, the
interrupt just needs to be detected. It’s the returned event from the ISR
that makes the handler ready to run. I think hardware interrupts are the
lowest priority on a x86 processor, so if the kernel was busy and had
disabled interrupts, I would never see the hardware interrupt that arrived
during that busy time, would I?

I didn’t look at the scheduling latency because I have a FPGA that handles
the I/O at 600 Hz. The fine time differences would be lost because all the
data is output at the same time during a 600 Hz frame. I might look at
writes on the address bus, though.

I tried the InterruptAttachEvent(Timer_IRQ,&timer_isr_event,0) call just to
try a different route to the handler. I figured if the context switches to
the ISR were removed, the timing might be changed enough to see some change
in the overall behavior. It didn’t.

The problem seems to still be missed interrupts, rather than handler
scheduling. I say this, because when the InterruptAttachEvent as defined
above, isn’t the kernel is solely responsible for detecting the interrupt
and scheduling the handler? That is, doesn’t the kernel perform the
additional function of the ISR that is used with the InterruptAttach() call?

Incidentally, there is quite a bit more processing taking place on this
board. The interrupt handler is just there to kick off a data collection
process. The data is processed pretty extensively after that. Spin puts the
cpu average load at 80 to 90 percent. I guess if the spin contribution to
that figure is removed, the cpu is at about 75 to 85 percent.

Thanks again for your interest and any suggestions.

Sincerely,
David Kuechenmeister


“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3DEDA95A.4050604@csical.com> …
David Kuechenmeister wrote:

My conclusion is that the kernel is somehow too busy to service the
hardware
interrupt and so my timing suffers.

Hmmm, since the kernel is pre-emptable, and (presumably) your handler is
at a higher priority than the kernel, I can’t see how the kernel could
be involved. The kernel should only make a scheduling decision upon exit
from the handler, and, if your using InterruptAttachEvent, and if your
event is a pulse, and if the pulse priority is the highest priority,
then the scheduler will context switch (and the context switch is - as
Chris states - very small) directly into your handler.


In order to determine if scheduling latency is playing a role, you
should replace InterruptAttachEvent with InterruptAttach of a handler
that does nothing more than toggle an I/O line high, and return the same
event that you would have registered with InterruptAttachEvent. Then as
the first instruction after the receive in your handler, toggle the I/O
line low. When the problem situation occurs compare the scheduling
latency, to that before the situation occured (obtained by hooking a
scope to the I/O line), if it hasn’t changed then the kernel is not
involved.

As a data point, a project I worked on in a previous life had a 5Khz
control loop - not simple data acquisition (on 16 channels
simultaneously) under QNX4 on 50Mhz 486, so a 166Mhz doing only a single
channel at only 600Hz should be using only a microscopically small
fraction of available CPU.

My question is what causes the kernel to
get in this state. If the processor was overloaded, wouldn’t the problem
show up immediately? When I use “spin”, thanks Igor, the only difference
in
cpu consumption I can see is that my interrupt handler takes a little
more
cpu time when it is missing interrupts. All I do is send a message from
the
handler to schedule my periodic process. Again, why does this work well
for
half a day, then go south?

You have checked the priority of the pulse right ?

Rennie
\


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Thanks for the corrections. Their amendments don’t seem to have an effect on
my basic problem, though. I have, undoubtedly, avoided other problems by
making these changes.

Maybe you could provide a little more information about a couple related
topics. Our interrupt controller,PIC, is an 82C59. That can be set to either
edge or level triggering. I’m generating an interrupt pulse from a FPGA that
is 45 usec long. In the level trigger mode, the INT line from the PIC is
only asserted when the interrupt level is high. It is unlikely, but possible
that internal processing would lock out the INT line during the time that it
is high. That would make me miss an interrupt. Supposedly that won’t happen
with the PIC set for edge triggering. What build or init file would I look
in to see how the PIC is programmed by the RTOS as it starts?


“Chris McKillop” <cdm@qnx.com> wrote in message
news:aslfpb$qhk$1@nntp.qnx.com

hey David…

Here are some things I have noticed:
[fixed]

Based on your comments I think you are a little confused about the
difference between InterruptAttach() and InterruptAttachEvent().

Confused isn’t the half of it. I do, or did anyway, DSP for most of my
career. Programming was just a tool to get it done. Thanks for the
assistance and patience.

…dk

You would want to look at the code to startup-bios (or whatever startup
code you are using for your device). I am still troubled that you are
not ack’ing your hardware’s interrupt register when you are using
InterruptAttachEvent() when you are doing it with InterruptAttach().

chris


David Kuechenmeister <david.kuechenmeister@viasat.com> wrote:

Thanks for the corrections. Their amendments don’t seem to have an effect on
my basic problem, though. I have, undoubtedly, avoided other problems by
making these changes.

Maybe you could provide a little more information about a couple related
topics. Our interrupt controller,PIC, is an 82C59. That can be set to either
edge or level triggering. I’m generating an interrupt pulse from a FPGA that
is 45 usec long. In the level trigger mode, the INT line from the PIC is
only asserted when the interrupt level is high. It is unlikely, but possible
that internal processing would lock out the INT line during the time that it
is high. That would make me miss an interrupt. Supposedly that won’t happen
with the PIC set for edge triggering. What build or init file would I look
in to see how the PIC is programmed by the RTOS as it starts?


“Chris McKillop” <> cdm@qnx.com> > wrote in message
news:aslfpb$qhk$> 1@nntp.qnx.com> …
hey David…

Here are some things I have noticed:
[fixed]

Based on your comments I think you are a little confused about the
difference between InterruptAttach() and InterruptAttachEvent().

Confused isn’t the half of it. I do, or did anyway, DSP for most of my
career. Programming was just a tool to get it done. Thanks for the
assistance and patience.

.dk


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

David Kuechenmeister wrote:

I understand, perhaps incorrectly, that the ISR isn’t scheduled, the
interrupt just needs to be detected.

More precisely, the kernel has a built in ISR, capable of returning an
event registered by a user program.

It’s the returned event from the ISR
that makes the handler ready to run. I think hardware interrupts are the
lowest priority on a x86 processor, so if the kernel was busy and had
disabled interrupts, I would never see the hardware interrupt that arrived
during that busy time, would I?

Not if interrupts were disabled, but disabling interrupts, is “a bad
thing”, and the kernel shouldn’t do this for more than a few hundred
nanoseconds (maybe a couple of microseconds worst case on a P166).

I didn’t look at the scheduling latency because I have a FPGA that handles
the I/O at 600 Hz. The fine time differences would be lost because all the
data is output at the same time during a 600 Hz frame. I might look at
writes on the address bus, though.

I would definately check it out, if for nothing more, than to satisfy
yourself that it is reasonably small.

I tried the InterruptAttachEvent(Timer_IRQ,&timer_isr_event,0) call just to
try a different route to the handler. I figured if the context switches to
the ISR were removed, the timing might be changed enough to see some change
in the overall behavior. It didn’t.

If all your ISR did was return the event, you shouldn’t see any
difference, since that is exactly what the “built in” ISR does (other
than - perhaps - a very tiny additional latency difference due to the
kernel not having to establish addressability for your ISR).

The problem seems to still be missed interrupts, rather than handler
scheduling. I say this, because when the InterruptAttachEvent as defined
above, isn’t the kernel is solely responsible for detecting the interrupt
and scheduling the handler? That is, doesn’t the kernel perform the
additional function of the ISR that is used with the InterruptAttach() call?

Exactly, but there is a limit to how many events can be queued up in the
kernels data space. Ultimately, if the average scheduling latency were
too high for your interrupt rate, you would eventually (over hours/days)
see lost interrupts. Since this is what you are seeing, it’s worth a
look (my guess is that scheduling latency, is not at fault, but there
always could be some obscure weirdness happening I suppose).

Incidentally, there is quite a bit more processing taking place on this
board. The interrupt handler is just there to kick off a data collection
process. The data is processed pretty extensively after that. Spin puts the
cpu average load at 80 to 90 percent. I guess if the spin contribution to
that figure is removed, the cpu is at about 75 to 85 percent.

Yes, but these other processes should have no bearing whatsoever, on the
CPU available to your high priority interrupt handler (all of these
other processes are always running at a lower priority right ?).

I just want to put things in perspective here. Let’s say that you were
trying to capture a 1000Hz interrupt. That’s 1 interrupt per
millisecond. The latency difference between an ISR handled via
InterruptAttach vs InterruptAttachEvent is maybe (and I’m being generous
here) 1 usec (i.e. 1000 times smaller). If you are missing interrupts,
you need to be looking at areas that are adding a lot more time than the
difference between these types of operations.

Also, what’s up with subtracting 10 from the max priority ? My
suggestion would be to set the priority to max (at least to try an
figure out what’s happening). Also, I now gather that you have a
dedicated interrupt thread that uses InterruptWait, is this correct ?
(this is not the “traditional” QNX way to do things, but should be more
efficient - not that I think the extra efficiency is called for in this
case - another one of those timing differences that are of an
insignificant scale to the problem).

I did make the change to ack the interrupt with InterruptAttachEvent. I
should have made it clear that I made all the changes that you suggested.
I’m sure they solved problems that haven’t reared their heads, yet. I just
haven’t been able to catch all the interrupts from my FPGA. I may start out
missing one in every couple thousand interrupts and degrade to missing two
or three interrupts out of every dozen.

I probably should have mentioned that the FPGA produces a 45 usec pulse, so
whether I ack it or not, it goes away. The FPGA is ignoring the ack for the
moment, anyway. I think I’m going to make the interrupt pulse respond to the
ack and see if that helps. If I find that this PIC is set to be level
triggered, it may make all the difference in the world. I’ll go back to
using the ISR and InterruptAttach in that case, though. I need to free the
FPGA to do the rest of its work as fast as I can. I hope I can service the
interrupt within 100 usec of its assertion. If not, I will start looking at
interrupt latency.

Maybe an example would help put my frustration in the proper context. If
this was a modem, I would be collecting a frame of IQ data on each
interrupt. If I missed even one frame in a couple thousand, I would have an
unacceptable BER. This application isn’t as critical, but I still think that
I should be able to service every interrupt. I’m sure the kernel is up to
this, but I am trying to find out what I have done to overload it, such that
is is probably disabling interrupts for a little too long.

Thanks again for your help. Where is the source code for startup-bios? We
are using QNX 6.1 and this is just a standard installation as it comes off
the CDROM.

…dk


“Chris McKillop” <cdm@qnx.com> wrote in message
news:asm4cd$9lp$1@nntp.qnx.com

You would want to look at the code to startup-bios (or whatever startup
code you are using for your device). I am still troubled that you are
not ack’ing your hardware’s interrupt register when you are using
InterruptAttachEvent() when you are doing it with InterruptAttach().

chris


David Kuechenmeister <> david.kuechenmeister@viasat.com> > wrote:
Thanks for the corrections. Their amendments don’t seem to have an
effect on
my basic problem, though. I have, undoubtedly, avoided other problems by
making these changes.

Maybe you could provide a little more information about a couple related
topics. Our interrupt controller,PIC, is an 82C59. That can be set to
either
edge or level triggering. I’m generating an interrupt pulse from a FPGA
that
is 45 usec long. In the level trigger mode, the INT line from the PIC is
only asserted when the interrupt level is high. It is unlikely, but
possible
that internal processing would lock out the INT line during the time
that it
is high. That would make me miss an interrupt. Supposedly that won’t
happen
with the PIC set for edge triggering. What build or init file would I
look
in to see how the PIC is programmed by the RTOS as it starts?


“Chris McKillop” <> cdm@qnx.com> > wrote in message
news:aslfpb$qhk$> 1@nntp.qnx.com> …
hey David…

Here are some things I have noticed:
[fixed]

Based on your comments I think you are a little confused about the
difference between InterruptAttach() and InterruptAttachEvent().

Confused isn’t the half of it. I do, or did anyway, DSP for most of my
career. Programming was just a tool to get it done. Thanks for the
assistance and patience.

.dk


\

Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Rennie Allen” <rallen@csical.com> wrote in message
news:3DEF0480.1060105@csical.com

David Kuechenmeister wrote:
[deleted explanation of InterruptAttachEvent]



I didn’t look at the scheduling latency because I have a FPGA that
handles
the I/O at 600 Hz. The fine time differences would be lost because all
the
data is output at the same time during a 600 Hz frame. I might look at
writes on the address bus, though.

I would definately check it out, if for nothing more, than to satisfy
yourself that it is reasonably small.

Thanks Rennie,

I did set up the system to measure timing directly through the parallel
port. I got some interesting numbers, too. I set up two inputs to my logic
analyzer and triggered on the FPGA raising my interrupt. The other line was
set high in the ISR and low in the handler. Yes, I’m back to using
InterruptAttach with all the changes suggested by Chris in another post and
with the handler priority boosted to sched_get_priority_max(SCHED_FIFO).

When the system was nominally idle, with only the ISR and handler thread
running, I measured a latency from the raising of the interrupt to the ISR
of about 5 usec. The difference between the ISR and the handler was about 12
usec. When I started the other processes in my application, the latency to
the ISR increased to about 23 usec, and it took an additional 35 usec until
the handler started running. The system still runs well, and I haven’t
“missed” and interrupt, yet. Since the system usually takes quite a few
hours to degrade, I’ll see what it looks like in the morning.

The delay of about 55 usec until the start of the handler concerns me a
little. It isn’t affecting the operation, but is significantly higher than
when the system is idle. The cpu is working at a pretty high load; is this
just a case of trying to do too much?


Sincerely,
…dk


[remainder deleted]

The morning numbers aren’t all that good, either. I can see that the kernel
must be causing me to miss the 600 Hz interrupt handling, now. The pattern
is typically as I described below, with the missed frames coming every 12 to
20 interrupts. The difference on the missed interrupts is a delay of about 2
millisecs from the interrupt to the start of the handler. The ISR may be
running, or not, since I only raise the parallel port line in the ISR. The
handler is certainly not being called, even though it is at the maximum
priority on the board.

It looks like there just isn’t enough time to do the application processing.
We’ll try a faster board, I suppose. What is still puzzling is why this mode
takes so long to develop? The processing load doesn’t change over time, why
should the scheduling?

Regards,
…dk

“David Kuechenmeister” <david.kuechenmeister@viasat.com> wrote in message
news:asoils$mkq$1@inn.qnx.com

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3DEF0480.1060105@csical.com> …
David Kuechenmeister wrote:
[deleted explanation of InterruptAttachEvent]

I didn’t look at the scheduling latency because I have a FPGA that
handles
the I/O at 600 Hz. The fine time differences would be lost because all
the
data is output at the same time during a 600 Hz frame. I might look at
writes on the address bus, though.

I would definately check it out, if for nothing more, than to satisfy
yourself that it is reasonably small.

Thanks Rennie,

I did set up the system to measure timing directly through the parallel
port. I got some interesting numbers, too. I set up two inputs to my logic
analyzer and triggered on the FPGA raising my interrupt. The other line
was
set high in the ISR and low in the handler. Yes, I’m back to using
InterruptAttach with all the changes suggested by Chris in another post
and
with the handler priority boosted to sched_get_priority_max(SCHED_FIFO).

When the system was nominally idle, with only the ISR and handler thread
running, I measured a latency from the raising of the interrupt to the ISR
of about 5 usec. The difference between the ISR and the handler was about
12
usec. When I started the other processes in my application, the latency to
the ISR increased to about 23 usec, and it took an additional 35 usec
until
the handler started running. The system still runs well, and I haven’t
“missed” and interrupt, yet. Since the system usually takes quite a few
hours to degrade, I’ll see what it looks like in the morning.

The delay of about 55 usec until the start of the handler concerns me a
little. It isn’t affecting the operation, but is significantly higher than
when the system is idle. The cpu is working at a pretty high load; is this
just a case of trying to do too much?


Sincerely,
.dk


[remainder deleted]

David Kuechenmeister <david.kuechenmeister@viasat.com> wrote in message
news:asq6ei$gia$1@inn.qnx.com

The morning numbers aren’t all that good, either. I can see that the
kernel
must be causing me to miss the 600 Hz interrupt handling, now. The pattern
is typically as I described below, with the missed frames coming every 12
to
20 interrupts. The difference on the missed interrupts is a delay of about
2
millisecs from the interrupt to the start of the handler. The ISR may be
running, or not, since I only raise the parallel port line in the ISR. The
handler is certainly not being called, even though it is at the maximum
priority on the board.

It looks like there just isn’t enough time to do the application
processing.
We’ll try a faster board, I suppose. What is still puzzling is why this
mode
takes so long to develop? The processing load doesn’t change over time,
why
should the scheduling?

Can you post a list of the IRQ’s in use in your system (ie. NIC card, serial
ports, peripherals etc)? Is there any other activity going on in the
system, while the test is being performed (ie. logging etc).

Also, modify your test to do the following:

-run you application doing the InterruptWait() at a priority band around 15
(not too high, just higher than most of the non-essential processes).
You’ll want to ensure you have a method of stopping the test (high priority
shell to slay etc) later :wink:

-make your application multithreaded (if it’s not already) with your main
code path in one thread and a busy/wait loop in the other.
eg: for (;;);

Let us know the results of the test.


Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

I’m going to post the email that was sent, this way everyone benifits from
shared knowledge and we’re all on the same page.

—email—

We are using a 166 Mhz pentium MMX processer on a PC-104
board. I think the
only IRQs that are in use are my timing pulse on IRQ 7 and the network
interface. I’m not that familiar with x86 processors, coming from a
background in Motorola and TI DSPs. How can I tell where IRQs
are allocated?

With the version of QNX6 you’re using there isn’t a specific command you can
run to make that query. Basicly you have to look at the hardware you have,
and the drivers you’re running (or other processes) to get the information.
If you have PCI devices you can use the ‘pci’ util to query information,
including IRQ line, of the device.

You could also query the PIC to see what the interrupt mask is and infer
that unmasked interrupts at the point are ones in use ; of course the state
of the PIC at the time might not truly represent which interrupts are being
handled nor how many processes are in fact interested in each.

This isn’t as much of a test as it is monitoring the system
under a full
load. It’s a controller with 600 Hz loops, as well as some
data collection
and data output on an ethernet connection. There isn’t any
logging, other
than error messages to a syslog. I do have the fs-nfs2
running, so I can
mount a filesystem from another computer, as well as ntpd, so
I can keep the
time set accurately.

You may want to eliminate the ethernet connection to see if that improves
the situation any - if your on a network with moderate traffic you could be
getting interrupted more often than you’d like, especially if the NIC card
uses an IRQ w/ a higher priority than 7.

Could you find out how the standard QNX 6.1 startup-bios sets up the
interrupt controller? I’m trying to figure out if the PIC is
set to be edge
or level triggered. I thought I would be able to find from
the startup-bios
source, but I can’t seem to locate that,either.

I believe you can download the BSP source from commercial.qnx.com. The PIC
is setup such that IRQ priority is the following (highest to lowest):

master: 3,4,5,6,7,0,1,2 slave: 8,9,10,11,12,13,14,15

Note, that on x86 the slave PIC is cascaded into IRQ 2 on the master.



\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

“Adam Mallory” <amallory@qnx.com> wrote in message
news:asqtnd$c9e$1@nntp.qnx.com

I believe you can download the BSP source from commercial.qnx.com. The
PIC
is setup such that IRQ priority is the following (highest to lowest):

Uncle! I can’t install these packages since I don’t have the qek licenses.
Do I need to make a support call to find out what the default interrupt
trigger is for a x86 standard installation?

master: 3,4,5,6,7,0,1,2 slave: 8,9,10,11,12,13,14,15

Note, that on x86 the slave PIC is cascaded into IRQ 2 on the master.

snip

Uncle! I can’t install these packages since I don’t have the qek licenses.
Do I need to make a support call to find out what the default interrupt
trigger is for a x86 standard installation?

master: 3,4,5,6,7,0,1,2 slave: 8,9,10,11,12,13,14,15

I did provide the list in priority order. Perhaps I didn’t understand what
you’re asking for.


Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

I was looking for the default interrupt trigger, either edge or level.

Thanks,
…dk
“Adam Mallory” <amallory@qnx.com> wrote in message
news:asr302$1kp$1@nntp.qnx.com

Uncle! I can’t install these packages since I don’t have the qek
licenses.
Do I need to make a support call to find out what the default interrupt
trigger is for a x86 standard installation?

master: 3,4,5,6,7,0,1,2 slave: 8,9,10,11,12,13,14,15

I did provide the list in priority order. Perhaps I didn’t understand
what
you’re asking for.


Cheers,
Adam

QNX Software Systems Ltd.
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net

David Kuechenmeister wrote:

The morning numbers aren’t all that good, either. I can see that the kernel
must be causing me to miss the 600 Hz interrupt handling, now. The pattern
is typically as I described below, with the missed frames coming every 12 to
20 interrupts. The difference on the missed interrupts is a delay of about 2
millisecs from the interrupt to the start of the handler.

2 milliseconds ! Well, whatever is causing the 2 millisecond delay is
what is causing your problem (this is what I meant by looking for the
problem at the right scale).

Oh crap, I just thought of something; you don’t have SMM enabled do
you ? (this should have been the first question anyone asked you).

Rennie

Wow! A google search on SMM turned up a wealth of information about the
kernel and about QNX interrupt handling, in general. I’ll certainly
investigate this and turn it off, however that is done.

Whoever came up with this scheme at Intel should be shot.

Thanks,
…dk

“Rennie Allen” <rgallen@attbi.com> wrote in message
news:3DF14100.80305@attbi.com

David Kuechenmeister wrote:
The morning numbers aren’t all that good, either. I can see that the
kernel
must be causing me to miss the 600 Hz interrupt handling, now. The
pattern
is typically as I described below, with the missed frames coming every
12 to
20 interrupts. The difference on the missed interrupts is a delay of
about 2
millisecs from the interrupt to the start of the handler.

2 milliseconds ! Well, whatever is causing the 2 millisecond delay is
what is causing your problem (this is what I meant by looking for the
problem at the right scale).

Oh crap, I just thought of something; you don’t have SMM enabled do
you ? (this should have been the first question anyone asked you).

Rennie

David Kuechenmeister <david.kuechenmeister@viasat.com> wrote:

Wow! A google search on SMM turned up a wealth of information about the
kernel and about QNX interrupt handling, in general. I’ll certainly
investigate this and turn it off, however that is done.

If you even can!! :frowning:

Whoever came up with this scheme at Intel should be shot.

Yep.

chris

\

Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Looks like SMM is represented by Power Managment and ACPI in our bios.
That’s what our WinSystems tech rep told me, anyway. These are both
disabled, so I don’t think I’m going to have any SMM problems.

I’ll have to go back and work through the suggestions Adam made, now, since
I don’t think either of these features were enabled earlier.

I did get the ack done for the FPGA, so we should be able to get an
additional data point for how long it takes the ISR to ack the interrupt. If
I understand the basic problem with SMM, I shouldn’t even see the ISR ack
the interrupt when the system is in SMM mode.

…dk

“Rennie Allen” <rgallen@attbi.com> wrote in message
news:3DF14100.80305@attbi.com

David Kuechenmeister wrote:
The morning numbers aren’t all that good, either. I can see that the
kernel
must be causing me to miss the 600 Hz interrupt handling, now. The
pattern
is typically as I described below, with the missed frames coming every
12 to
20 interrupts. The difference on the missed interrupts is a delay of
about 2
millisecs from the interrupt to the start of the handler.

2 milliseconds ! Well, whatever is causing the 2 millisecond delay is
what is causing your problem (this is what I meant by looking for the
problem at the right scale).

Oh crap, I just thought of something; you don’t have SMM enabled do
you ? (this should have been the first question anyone asked you).

Rennie

David Kuechenmeister wrote:

Looks like SMM is represented by Power Managment and ACPI in our bios.
That’s what our WinSystems tech rep told me, anyway. These are both
disabled, so I don’t think I’m going to have any SMM problems.

Power management is not all that is done with SMM, for instance
a common use is to provide legacy support for USB devices. One
chip many years ago, emulated an 8237 DMA controller in SMM.

Bottom line is, just because you disabled power management in
the BIOS it does not necessarily mean that SMM is disabled.

I’ll have to go back and work through the suggestions Adam made, now, since
I don’t think either of these features were enabled earlier

I wouldn’t be too quick to eliminate SMM. SMM is about the
only thing I have ever run into that can cause this sort
of a delay.

I did get the ack done for the FPGA, so we should be able to get an
additional data point for how long it takes the ISR to ack the interrupt. If
I understand the basic problem with SMM, I shouldn’t even see the ISR ack
the interrupt when the system is in SMM mode.

That’s right, QNX doesn’t exist on the CPU when SMM is
active. Disabling SMM can be a daunting task (as Chris
McKillop alluded to).