hardware interrupt IRQ conflict

Wojtek Lerch wrote:

A thread that calls
InterruptMask() and InterruptUnmask() can get pre-empted between those
calls, too. The main difference is that InterruptAttachEvent() may let it
happen more often (sometimes a lot more often), because it makes it possible
for your interrupt to get masked while a high-priority thread is already
running. Whether that’s a big difference depends on whether you can live
with it happening occasionally; but if it can cause people to die when it
happens, it’s probably better if it happens every five seconds rather than
once in five months, because this way your product won’t get shipped until
it’s fixed.

I think a real-time o.s. should guarantee none of its device drivers
keep interrupt totally disabled (cli/sti) or any IRQ masked
(InterruptMask+InterruptAttachEvent/InterruptUnmask) for more than a
little amount of time.

I mean “little amount of time” as something like a few us (thousands of
clock cycles for a modern PC!).

If the customer application has special requirements, then it will
decide to use IRQ masking for its devices, choosing all related
advantages/drowbacks: using IRQ masking in o.s. distribuited network
drivers (only network ones?) does not allow customer software to expect
a deterministic interrupt handling behaviour at all, imho.

Thus, the o.s. cannot be called a “real-time” one.

Davide


/* Ancri Davide - */

Davide Ancri wrote:

Wojtek Lerch wrote:

A thread that calls InterruptMask() and InterruptUnmask() can get
pre-empted between those calls, too. The main difference is that
InterruptAttachEvent() may let it happen more often (sometimes a lot
more often), because it makes it possible for your interrupt to get
masked while a high-priority thread is already running. Whether
that’s a big difference depends on whether you can live with it
happening occasionally; but if it can cause people to die when it
happens, it’s probably better if it happens every five seconds rather
than once in five months, because this way your product won’t get
shipped until it’s fixed.


I think a real-time o.s. should guarantee none of its device drivers
keep interrupt totally disabled (cli/sti) or any IRQ masked
(InterruptMask+InterruptAttachEvent/InterruptUnmask) for more than a
little amount of time.

IMHO … the call InterruptAttachEvent() should be removed from the
library. It is a potential killer of any real-time behaviour!

–Armin

“Armin Steinhoff” <a-steinhoff@web.de> wrote in message
news:d8jfjp$a68$1@inn.qnx.com

Davide Ancri wrote:
Wojtek Lerch wrote:

A thread that calls InterruptMask() and InterruptUnmask() can get
pre-empted between those calls, too. The main difference is that
InterruptAttachEvent() may let it happen more often (sometimes a lot
more often), because it makes it possible for your interrupt to get
masked while a high-priority thread is already running. Whether that’s
a big difference depends on whether you can live with it happening
occasionally; but if it can cause people to die when it happens, it’s
probably better if it happens every five seconds rather than once in
five months, because this way your product won’t get shipped until it’s
fixed.

I think a real-time o.s. should guarantee none of its device drivers
keep interrupt totally disabled (cli/sti) or any IRQ masked
(InterruptMask+InterruptAttachEvent/InterruptUnmask) for more than a
little amount of time.

IMHO … the call InterruptAttachEvent() should be removed from the
library. It is a potential killer of any real-time behaviour!

Just InterruptAttachEvent(), or maybe InterruptMask() and InterruptUnmask(),
too? (You did read what I wrote, right?)

What about InterruptDisable() and InterruptEnable()?

BTW I can think of a lot of other library functions that could potentially
be a killer of real-time behaviour (for instance, kill()… :wink: Or
pthread_setschedparam()…). Should they all be removed?

:wink:

Wojtek Lerch wrote:

Just InterruptAttachEvent(), or maybe InterruptMask() and InterruptUnmask(),
too? (You did read what I wrote, right?)

They do go hand-in-hand so why not? That way there is no temptation to do the same silly behaviour from a driver’s ISR code also. And it saves the unnecessary extra twiddling of the IRQ mask registers when the device is already masked anyway.


What about InterruptDisable() and InterruptEnable()?

Funnily enough, because ID()/IE() prevent preemption altogether, they don’t induce the bug.


Evan

“Evan Hillas” <evanh@clear.net.nz> wrote in message
news:d8dqi6$7ad$1@inn.qnx.com

Evan Hillas wrote:
This prevents any other driver ISR from getting it’s due.


Correction: This prevents all other drivers on that IRQ from getting
their due. There is no possibility of priorities, period.

Correction, This prevents all other driver on that IRQ and of IRQ with lower
priority to get their due. That means when InterruptAttachEvent attach is
used, IRQ of lower priority get a lower priority then that of the thread
serving the event, that seems to defeat the purpose of IRQ which is typicaly
be of higher priority the processes/threads.

Evan

Evan Hillas wrote:

Wojtek Lerch wrote:

Just InterruptAttachEvent(), or maybe InterruptMask() and
InterruptUnmask(), too? (You did read what I wrote, right?)


They do go hand-in-hand so why not? That way there is no temptation to
do the same silly behaviour from a driver’s ISR code also. And it saves
the unnecessary extra twiddling of the IRQ mask registers when the
device is already masked anyway.

But when you’re writing a driver meant to run in a controlled
environment, where you know that the interrupt will not be shared or no
higher-priority threads will have a chance to pre-empt yours for too
long, and want to minimize the interrupt latency, isn’t is possible that
IAE() may be the best way to go?

BTW You wanted an example: according to our docs, the floppy disk
controller can take many milliseconds to clear the interrupt in the
device. How would you handle that without being able to mask the
interrupt?

What about InterruptDisable() and InterruptEnable()?


Funnily enough, because ID()/IE() prevent preemption altogether, they
don’t induce the bug.

No, they induce a different one. They encourage writing drivers that
work perfectly fine on a single processor but have nasty race conditions
on a multiprocessor system.

In article <d8kmfe$7o8$1@inn.qnx.com>, postmaster@127.0.0.1 says…
[…]

Correction, This prevents all other driver on that IRQ and of IRQ with lower
priority to get their due.

How come? I had time today to read all that old discussion that Evan
mentioned at the begining of this thread. It seems you have agreed then
IAE() doesn’t interfere with the lower priority interrupts. “Default”
ISR used by IAE masks interrupt line exactly for the reason to give
possibility to issue EOI to PIC.

Eduard.

In article <MPG.1d16dc23d0994e869896cf@inn.qnx.com>, ed1k@fake.address
says…

I am not sure > :slight_smile: > Probably not, I’m not saying. Thought I can’t think of
a good example.

I did grep over linux/kernel/drivers today in order to find a good
example. Found quite a few, they are mostly related to possible
missbehaviour of hardware, usually reffered to as timeouts. I.e., for
example, you send some data and expect “tx finished” kind of interrupt
in some reasonable timeframe. If it is not the case you may want to
reset the hardware and reinit it. During this procedure you may want to
mask interrupt line and unmask it after it’s done. If this procedure can
take prolong time, cli/sti couple is not even an option.

BTW, disable_irq(int no) in linux waits for currently executing ISR
handler if any to complete. QNX documentation doen’t mention such
behaviour, so I assume InterruptMask() doesn’t wait. So, it’s a really
bad wrapper for critical section… and I honestly would be thankful for
example of good usage of InterruptMask() in a thread.

Since masking IRQ in linux is SMP safe (although after calling
disable_irq() you may want to check timeout conditions again – it’s
possible that while you were blocked during this call, different CPU
handled interrupt intno chain servicing “tx finished” interrupt), it
makes sence to mask interrupt from within kernel thread. Also they have
nonblocking call to mask interrupt which isn’t recommended to use in
threads, but probably is good to use within ISR.

Because InterruptMask()/InterruptUnmask() pair is not SMP safe (are
they?), I don’t see any use of InterruptMask() in QNX driver threads. I
assume they reliable work only if InterruptMask() is issued by ISR, and
InterruptUnmask() by IST. Well, if you know your driver will run on
uniprocessor system only, it’s up to you to use them as there is no race
conditions in this case. However, all my experience tells me that
despite your assumption and independetly of you (or may be because of
your assumption about UP systems only), your driver will run on SMP,
almost flawlessly till some point.

Eduard.

P.S. I just checked documentation – only for InterruptDisable() it’s
mentioned there is SMP safe InterruptLock() alternative. BTW,
InterruptDisable page says “If the thread doesn’t do this, it might
SIGSEGV when InterruptUnlock() is called.” – I never thought I can
disable interrupt in one way and then unlock it by different means
without SIGSEGV or deadlock :slight_smile: I think this is cut’n’paste issue of
documentation team. Had to be corrected.

Wojtek Lerch wrote:

What about InterruptDisable() and InterruptEnable()?

Funnily enough, because ID()/IE() prevent preemption altogether,
they don’t induce the bug.

No, they induce a different one. They encourage writing drivers that
work perfectly fine on a single processor but have nasty race
conditions on a multiprocessor system.

I think the customer of a o.s. should be left free to choose dangerous
operations: I buy a real-time o.s. because I know what I’m doing while
playing around with interrupts, an for the same reason I am the only
person that can choose to mask or disable interruptions for more than a
few microseconds.

What is not acceptable is buying an o.s. called “real-time”, let some
system process run and then discovering that my software cannot always
expect a real-time behaviour of the system.

InterruptAttachEvent() must disappear from any QNX-shipped device driver
possibly running in a IRQ-shared environment, and the same for
Interrupt-Mask/Unmask-Lock/Unlock-Enable/Disable() for “long” periods
(some us).

Davide


/* Ancri Davide - */

Wojtek Lerch wrote:

“Armin Steinhoff” <> a-steinhoff@web.de> > wrote in message
news:d8jfjp$a68$> 1@inn.qnx.com> …
[clip …]
IMHO … the call InterruptAttachEvent() should be removed from the
library. It is a potential killer of any real-time behaviour!


Just InterruptAttachEvent(),

exactly … the concept behind this call is dangerous.

or maybe InterruptMask() and InterruptUnmask(),

too? (You did read what I wrote, right?)

I read all of your postings :slight_smile:

What about InterruptDisable() and InterruptEnable()?

They are useful … if you know what you do.

–Armin

Rennie Allen wrote:

Davide Ancri wrote:


InterruptAttachEvent() must disappear from any QNX-shipped device driver
possibly running in a IRQ-shared environment, and the same for
Interrupt-Mask/Unmask-Lock/Unlock-Enable/Disable() for “long” periods
(some us).


IAE better not disappear from device drivers ! IAE is the only way to
truly have control over all priorities (including the interrupt
handlers of shipped drivers).

When we talk about InterruptAttachEvent(), these handlers are threads
which can be preempted by simple application code running at a higher
priority.

As long as there are no rules in assigning of priorities to interrupt
threads and applications … the behaviour of critical interrupt threads
can be very unpredictable.

–Armin

Davide Ancri wrote:


InterruptAttachEvent() must disappear from any QNX-shipped device driver
possibly running in a IRQ-shared environment, and the same for
Interrupt-Mask/Unmask-Lock/Unlock-Enable/Disable() for “long” periods
(some us).

IAE better not disappear from device drivers ! IAE is the only way to
truly have control over all priorities (including the interrupt
handlers of shipped drivers). My only criticism is of QNX components
that don’t have command line parameters for setting the value of
every priority object that they employ (note: this situation improved
dramatically with sp1).

Rennie

ed1k <ed1k@fake.address> wrote:

In article <> MPG.1d16dc23d0994e869896cf@inn.qnx.com> >, > ed1k@fake.address
says…


Because InterruptMask()/InterruptUnmask() pair is not SMP safe (are
they?),

My understanding is that they are SMP safe.

Eduard.

P.S. I just checked documentation – only for InterruptDisable() it’s
mentioned there is SMP safe InterruptLock() alternative. BTW,
InterruptDisable page says “If the thread doesn’t do this, it might
SIGSEGV when InterruptUnlock() is called.”
– I never thought I can
disable interrupt in one way and then unlock it by different means
without SIGSEGV or deadlock > :slight_smile: > I think this is cut’n’paste issue of
documentation team. Had to be corrected.

The “if the thread doesn’t do this” bit is referring to the ThreadCtl(),
and, yes, there is definitely a cut & paste error there – it’s in
the InterruptLock() page, too. I’ll issue a PR to have it fixed.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Armin Steinhoff wrote:

As long as there are no rules in assigning of priorities to interrupt
threads and applications … the behaviour of critical interrupt threads
can be very unpredictable.

I make the rules

It is my application and I want to be able to control every aspect of it.
If the work is done in the ISR instead of an IHT I have very limited (and
completely non-portable) control of how it gets scheduled. If the work is
done in an IHT (and I have command line priority parameters for stock
drivers) then I have complete (and portable) control of how everything
gets scheduled.

Rennie

Armin Steinhoff wrote:

exactly … the concept behind this call is dangerous.

The concept behind this call is not dangerous. The apparent
(and surprising) general ignorance of how to use it is…

Rennie

David Gibbs <dagibbs@qnx.com> wrote:

ed1k <> ed1k@fake.address> > wrote:
In article <> MPG.1d16dc23d0994e869896cf@inn.qnx.com> >, > ed1k@fake.address
says…


Because InterruptMask()/InterruptUnmask() pair is not SMP safe (are
they?),

My understanding is that they are SMP safe.

Hm… well, on more thought, maybe not.

They talk directly to the PIC, so if the interrupt level is masked,
it is masked for all CPUs in an SMP situation. But, there is a
potential race condition:

PIC raises interrupt X, CPU1 starts handling it
Thread on CPU2 masks interrupt X
handler for interrupt X runs on CPU1 while thread on CPU2 thinks
it is safe to touch hardware.

I don’t know if there is any code to check whether or not this
could happen.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Rennie Allen wrote:

IAE better not disappear from device drivers ! IAE is the only way to
truly have control over all priorities (including the interrupt

The priorities don’t disappear with an ISR, the drivers that want to do the work with a prioritised thread can still get sent the same events using InterruptAttach() as InterruptAttachEvent() can send.


Evan

Rennie Allen wrote:

I make the rules

It is my application and I want to be able to control every aspect of it.
If the work is done in the ISR instead of an IHT I have very limited (and
completely non-portable) control of how it gets scheduled. If the work is
done in an IHT (and I have command line priority parameters for stock
drivers) then I have complete (and portable) control of how everything
gets scheduled.

You’re miss understanding what is being asked for. The work is still done in the same places just that the driver masks/unmasks the device instead of the IRQ. That’s all the difference is.


Evan

Rennie Allen wrote:

Armin Steinhoff wrote:


exactly … the concept behind this call is dangerous.


The concept behind this call is not dangerous. The apparent
(and surprising) general ignorance of how to use it is…

Agreed. And the requirement is that the driver/designer has to be certain there is no other nor will there be another driver sharing the IRQ.


Evan

Wojtek Lerch wrote:

But when you’re writing a driver meant to run in a controlled
environment, where you know that the interrupt will not be shared or no
higher-priority threads will have a chance to pre-empt yours for too
long, and want to minimize the interrupt latency, isn’t is possible that
IAE() may be the best way to go?

The difference is small and maybe non-existent when compared to the amount of fluffing about with status the driver will be doing anyway. But yep, I’m all for optimising.


BTW You wanted an example: according to our docs, the floppy disk
controller can take many milliseconds to clear the interrupt in the
device. How would you handle that without being able to mask the
interrupt?

I guess that’s one device that can’t and never has shared. After all, it is an ISA device of the oldest order. There is always going to be some that can’t do it. That doesn’t mean that all are helpless and it’s fare to say that the problem is worse because of the large number of legacy IRQ slots.


No, they induce a different one. They encourage writing drivers that
work perfectly fine on a single processor but have nasty race conditions
on a multiprocessor system.

Point taken. IAE() can be used in the right setup but one has to be careful about it. And that is not done or understood by most. It’s possible benefit is not worth the stress.


Evan