QNX4 dies in irq

We have i82527 based can-card installed in our system. We are running
self-written device ‘driver’ which attaches irq of cancard to itself
with qnx_hint_attach. Program is not real driver usi Device manager,
it’s only normal program with IO-handling priileges and hint_attach.

We have been experiencing mysterious system freeze sometimes (like
once/24 hours). We tried to put some process running on higher
priority to see if even it would be running, but whole system is
frozen. Kernel is not printing any messages to screen.

Now we have added some debugging for irq (by blinking leds with outp),
and this is impressions we get:

On beginning of irq handler we disable interrupts from can-chip to
prevent getting new irq before we get previous handled. After all the
processing in irq, last thing we do is re-enable irq from can
chip. With led-states it looks like every freeze we get is right after
enabling irq. My theory is, there has been new message on can, and
i82527 generates new irq right after enabling interrupts. And in qnx
point of view, it looks like irq line getting up before previous irq
hanler has finished.

So question is, has anybody else seen such problem before? Would it be
possible there might be such problem in qnx4? Any ideas how to avoid
problem?

We are running QNX 4.25E.

If replying with e-mail, remove x-letters in my e-mail.


M. Tavasti / tavastixx@iki.fi / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

“M. Tavasti” wrote:

On beginning of irq handler we disable interrupts from can-chip to
prevent getting new irq before we get previous handled.

Do you mean you ‘qnx_hint_mask()’ your IRQ ??? QNX already mask your IRQ
(and all IRQ with lower prority…) before performing calls to the ISRs.

After all the
processing in irq, last thing we do is re-enable irq from can
chip.

Do you ‘qnx_hint_mask()’ here (with action = 1 → enable) your IRQ ?
This give a big risk: if IRQ line is driven active while QNX si calling
ISRs,
when you unmask the IRQ (into your ISR) the kernel will IMMEDIATLY get
interrupted… I don’t know what it implies… but I suppose nothing
good :^)

QNX already unmask your IRQ when all relative ISRs have been called.

hope it help…

/------------------------------------------------------------

  • Davide Ancri - Prisma Engineering
  • & email = davidea at prisma dash eng dot it
    ------------------------------------------------------------/

Davide Ancri <no.more.spam@nowhere.org> writes:

On beginning of irq handler we disable interrupts from can-chip to
prevent getting new irq before we get previous handled.
Do you mean you ‘qnx_hint_mask()’ your IRQ ??? QNX already mask your IRQ
(and all IRQ with lower prority…) before performing calls to the ISRs.

No, but writing ‘disable irq’ for i82527 (the can chip we use).

This give a big risk: if IRQ line is driven active while QNX si calling
ISRs,
when you unmask the IRQ (into your ISR) the kernel will IMMEDIATLY get
interrupted… I don’t know what it implies… but I suppose nothing
good :^)

We are not using hint_mask, but is the situation still the same?


M. Tavasti / tavastixx@iki.fi / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

“M. Tavasti” <tavastixx@iki.fi.invalid> wrote in message
news:m23d0p4t8h.fsf@akvavitix.vuovasti.com

We have i82527 based can-card installed in our system. We are running
self-written device ‘driver’ which attaches irq of cancard to itself
with qnx_hint_attach. Program is not real driver usi Device manager,
it’s only normal program with IO-handling priileges and hint_attach.

We have been experiencing mysterious system freeze sometimes (like
once/24 hours). We tried to put some process running on higher
priority to see if even it would be running, but whole system is
frozen. Kernel is not printing any messages to screen.

Now we have added some debugging for irq (by blinking leds with outp),
and this is impressions we get:

On beginning of irq handler we disable interrupts from can-chip to
prevent getting new irq before we get previous handled. After all the
processing in irq, last thing we do is re-enable irq from can
chip. With led-states it looks like every freeze we get is right after
enabling irq. My theory is, there has been new message on can, and
i82527 generates new irq right after enabling interrupts. And in qnx
point of view, it looks like irq line getting up before previous irq
hanler has finished.

So question is, has anybody else seen such problem before? Would it be
possible there might be such problem in qnx4? Any ideas how to avoid
problem?

Although not related, you should check if there any other pending interrupt
before enable the ISR again.

while ( chips has ISR set ) {

}

You also don’t need to disable the interrupt.inside the handler as interrupt
at your level and below as masked by the OS.

Is there any loop in your code that could freeze? Make sure you are not
using any library call? Have you compiled with -zu and with stack disabled?


We are running QNX 4.25E.

If replying with e-mail, remove x-letters in my e-mail.


M. Tavasti / > tavastixx@iki.fi > / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

On beginning of irq handler we disable interrupts from can-chip to
prevent getting new irq before we get previous handled.

I’m not sure about what you mean by this. Do you mean that you
program the interrupt controller yourself? In QNX4 you should
not be doing this. The kernel has already disabled your
interrupt when you are in your interrupt handler. In fact all
lower priority interrupts are also masked out. Higher priority
interrupts are allow to pre-empt your handler so they are not
masked.

I’m not sure what the effect of unmasking your interrupt line
would be, but I can think of some bad possibilities that might
be caused by race conditions with other interrupts.

Otherwise, it sounds like you might be frozen in an interrupt
handler that always fires after the handler leaves.

Mitchell Schoenbrun --------- maschoen@pobox.com

“Mario Charest” <goto@nothingness.com> writes:

Although not related, you should check if there any other pending interrupt
before enable the ISR again.

while ( chips has ISR set ) {

}

If I understood correct, you suggest looping in IRQ-handler while
there is no IRQ pending. I don’t think it’s possible to implement 100%
proff, since looping may result too much time spent on irq, and whole
system will crash (with kernel stack dump).

You also don’t need to disable the interrupt.inside the handler as interrupt
at your level and below as masked by the OS.

Yes, I know they are masked on OS side, but we are telling
Can-controller not to give other interrupt during handling
previous. Without it, there can be new irq immediately after reading

Is there any loop in your code that could freeze?

No.

Make sure you are not using any library call?

Yes, this is sure.

Have you compiled with -zu and with stack disabled?

Yes.


Everything works fine most of the time ( working 24h without problems
with quite high load), but on rare cases, total jam.


M. Tavasti / tavastixx@iki.fi / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

Mitchell Schoenbrun <maschoen@pobox.com> writes:

On beginning of irq handler we disable interrupts from can-chip to
prevent getting new irq before we get previous handled.
I’m not sure about what you mean by this. Do you mean that you
program the interrupt controller yourself?

No, not that or using qnx_hint_mask. Just removing INT_ENABLE bit from
i82527 Can-chip (see
http://www.intel.com/design/auto/can/datashts/272250.htm if you want
more details)

Otherwise, it sounds like you might be frozen in an interrupt
handler that always fires after the handler leaves.

No, it’s not the case. We have now turning on & off some led’s on irq,
and when freeze occurs, led’s are burning all the time, showing we
have frozen on stage where there’s nothing else than writing
INT_ENABLE to i82527.


M. Tavasti / tavastixx@iki.fi / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

“M. Tavasti” <tavastixx@iki.fi.invalid> wrote in message
news:m2r8o81hl6.fsf@akvavitix.vuovasti.com

“Mario Charest” <> goto@nothingness.com> > writes:

Although not related, you should check if there any other pending
interrupt
before enable the ISR again.

while ( chips has ISR set ) {

}

If I understood correct, you suggest looping in IRQ-handler while
there is no IRQ pending. I don’t think it’s possible to implement 100%
proff, since looping may result too much time spent on irq, and whole
system will crash (with kernel stack dump).

No this technique reduces overhead… To be clear you don’t loop
while there is no IRQ pending, you loop UNTIL there is no IRQ pending.

Example: There is an IRQ, your ISR is invoked. During the ISR the board
generates another IRQ (which become pending), after exiting the ISR
the OS unmask the interrupt. Then the OS has to call your ISR again,
which is extra overhead. If you would have check if there was an
interrupt pending before leaving the ISR you save the overhead.

Looping to handle pending ISR makes not difference since your ISR
would be called again by the OS in a loop anyway.

Of course as soon as there are no pending ISR you have to get out
of the ISR :wink:

You also don’t need to disable the interrupt.inside the handler as
interrupt
at your level and below as masked by the OS.

Yes, I know they are masked on OS side, but we are telling
Can-controller not to give other interrupt during handling
previous. Without it, there can be new irq immediately after reading

Doesn’t matter since the IRQ is masked (at the controller level).
Even if the board issue an IRQ is won’t matter until the OS
unmasks it.

Is there any loop in your code that could freeze?

No.

Make sure you are not using any library call?

Yes, this is sure.

Have you compiled with -zu and with stack disabled?

Yes.

Everything works fine most of the time ( working 24h without problems
with quite high load), but on rare cases, total jam.

It could be that somehow the code can’t clear the interrupt properly
or that the board freeze the bus (does it do DMA)


M. Tavasti / > tavastixx@iki.fi > / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

“M. Tavasti” <tavastixx@iki.fi.invalid> wrote in message
news:m2ofjc1h7l.fsf@akvavitix.vuovasti.com

Mitchell Schoenbrun <> maschoen@pobox.com> > writes:

On beginning of irq handler we disable interrupts from can-chip to
prevent getting new irq before we get previous handled.
I’m not sure about what you mean by this. Do you mean that you
program the interrupt controller yourself?

No, not that or using qnx_hint_mask. Just removing INT_ENABLE bit from
i82527 Can-chip (see
http://www.intel.com/design/auto/can/datashts/272250.htm > if you want
more details)

Otherwise, it sounds like you might be frozen in an interrupt
handler that always fires after the handler leaves.

No, it’s not the case. We have now turning on & off some led’s on irq,
and when freeze occurs, led’s are burning all the time, showing we
have frozen on stage where there’s nothing else than writing
INT_ENABLE to i82527.

Have you checked with a scope to ensure the leds aren’t in fact
blinking very fast because of endless ISR. Depending on the duty
cycle of the blinking, the led may not dim at all or too little
to be seen by human eyes.


M. Tavasti / > tavastixx@iki.fi > / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

“Mario Charest” <goto@nothingness.com> writes:

Have you checked with a scope to ensure the leds aren’t in fact
blinking very fast because of endless ISR.

No, we haven’t. Hmm, in theory if ISR would fire withouth any valid
IRQ reason, led’s might burn (or blink) like we have seen, so we check
out. But what would make QNX to call ISR continuously, I would say
hardware wouldn’t generate enlessly irq’s, or at least we haven’t seen
it in linux.

Let’s examine more, I’ll be back.


M. Tavasti / tavastixx@iki.fi / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address

“Mario Charest” <goto@nothingness.com> writes:

Have you checked with a scope to ensure the leds aren’t in fact
blinking very fast because of endless ISR. Depending on the duty
cycle of the blinking, the led may not dim at all or too little
to be seen by human eyes.

Now we have checked it several times, and leds are burning (100%
time), not blinking. So case is freeze, not endless ISR.


M. Tavasti / tavastixx@iki.fi / +358-40-5078254
Poista sähköpostiosoitteesta molemmat x-kirjaimet
Remove x-letters from my e-mail address