hardware interrupt IRQ conflict

Rennie_Allen2 · June 10, 2005, 4:24pm

ed1k wrote:

[…]

QNX domain.

[…]

as soon as the thread issues UnmaskInterrupt(). I don’t know why this
is not default behaviour for InterruptAttachEvent(), but it can
be done programmatically for now. However, none of QNX standard
drivers do that, AFAIK. Therefore, some QNX standard driver might
be a dangerous neighbour to share interrupt line with.

Why programatically “boost” processing of IHT when you can simply attach
a high priority to the event that IAE returns ? If you want another
“part” of the handler that runs at a different priority, then simply
have the IHT queue a pulse (with a lower priority) to itself (the event
queuing mechanism will queue events in priority order to the IHT, so
if new events come in from IAE they will be put to the front of the
queue) once the interrupt has been unmasked.

Boosting interrupt hadling thread’s priority by passing argument
to driver may not be a good solution, because it looks more
logically to unmask interrupt ASAP and do the rest of job processing
the data with lower priority (although if rumours are true and
network drivers unmask interrupt after all data have been processed
this really doesn’t make difference, it’s just bad design).

Some network hardware makes this unavoidable. The hardware will not
de-assert the IRQ until all data have been processed.

Rennie

Colin_Burgess1 · June 10, 2005, 4:33pm

Rennie Allen wrote:

ed1k wrote:

[…]

QNX domain.

[…]

as soon as the thread issues UnmaskInterrupt(). I don’t know why this
is not default behaviour for InterruptAttachEvent(), but it can be
done programmatically for now. However, none of QNX standard drivers
do that, AFAIK. Therefore, some QNX standard driver might
be a dangerous neighbour to share interrupt line with.

Why programatically “boost” processing of IHT when you can simply attach
a high priority to the event that IAE returns ? If you want another
“part” of the handler that runs at a different priority, then simply
have the IHT queue a pulse (with a lower priority) to itself (the event
queuing mechanism will queue events in priority order to the IHT, so
if new events come in from IAE they will be put to the front of the
queue) once the interrupt has been unmasked.

There’s no point in enqueuing a low priority pulse - just drop your
priority there and there - they’re both kernel calls and the pulse just
adds overhead.

Just make sure that the interrupt pulse priority and the ‘low’ priority
are configurables, whatever you do! :v)

Boosting interrupt hadling thread’s priority by passing argument
to driver may not be a good solution, because it looks more
logically to unmask interrupt ASAP and do the rest of job processing
the data with lower priority (although if rumours are true and
network drivers unmask interrupt after all data have been processed
this really doesn’t make difference, it’s just bad design).

Some network hardware makes this unavoidable. The hardware will not
de-assert the IRQ until all data have been processed.

Rennie

–
cburgess@qnx.com

Rennie_Allen2 · June 10, 2005, 4:58pm

Colin Burgess wrote:

There’s no point in enqueuing a low priority pulse - just drop your
priority there and there - they’re both kernel calls and the pulse
just adds overhead.

Hmmm, the idea is to make the low priority processing deferred and
pre-emptable. Assuming that there are new high-priority events from
IAE in the queue, simply changing the priority in the same execution
path, leads to priority inversion. Queuing a pulse allows the IHT to
re-visit MsgReceive, and handle more IAE events before undertaking
lower priority processing. Granted, I’m not sure what the practical
issues are here, but I think that changing the priority in a single
path of execution is just plain wrong (since it practically implies
priority inversion).

Just make sure that the interrupt pulse priority and the ‘low’
priority are configurables, whatever you do! :v)

Amen Brotha !

Rennie

Colin_Burgess1 · June 10, 2005, 5:07pm

Rennie Allen wrote:

Colin Burgess wrote:

There’s no point in enqueuing a low priority pulse - just drop your
priority there and there - they’re both kernel calls and the pulse
just adds overhead.

Hmmm, the idea is to make the low priority processing deferred and
pre-emptable. Assuming that there are new high-priority events from
IAE in the queue, simply changing the priority in the same execution
path, leads to priority inversion. Queuing a pulse allows the IHT to
re-visit MsgReceive, and handle more IAE events before undertaking
lower priority processing. Granted, I’m not sure what the practical
issues are here, but I think that changing the priority in a single
path of execution is just plain wrong (since it practically implies
priority inversion).

Good point. I’m assuming that your are unmasking the interrupt prior to
sending the low priority pulse.

In this case you might even just have a secondary handler thread though.

Just make sure that the interrupt pulse priority and the ‘low’
priority are configurables, whatever you do! :v)

Amen Brotha !

Rennie
\

–
cburgess@qnx.com

Rennie_Allen2 · June 10, 2005, 7:30pm

Colin Burgess wrote:

Good point. I’m assuming that your are unmasking the interrupt prior to
sending the low priority pulse.

Probably, but I don’t think it matters as long as the interrupt is unmasked
before MsgReceive is called again.

In this case you might even just have a secondary handler thread though.

Possibly, but it’s hard to justify a new execution context when the job can
be done with a few bytes of data (a pulse struct) and an additional “case:”
label…

Evan_Hillas1 · June 11, 2005, 1:11am

Wojtek Lerch wrote:

The problem is not that the ISR masks an interrupt; the problem is that
you’re relying on a thread to unmask it. A thread that calls
InterruptMask() and InterruptUnmask() can get pre-empted between those
calls, too. The main difference is that InterruptAttachEvent() may let it
happen more often (sometimes a lot more often), because it makes it possible
for your interrupt to get masked while a high-priority thread is already
running.

On that note, what is the mechanics of InterruptLock()/InterruptUnLock()?

I can guess: When an ISR calls IL() and something else already has the lock then execution is passed directly to that something else until the lock is released with a call to IUL(). Then execution of the ISR continues immediately.

Are interrupts disabled during a lock?

Evan

ed1k2 · June 11, 2005, 2:53am

In article <42A9BEC3.9040203@csical.com>, rallen@csical.com says…

Why programatically “boost” processing of IHT when you can simply attach
a high priority to the event that IAE returns ? If you want another
“part” of the handler that runs at a different priority, then simply
have the IHT queue a pulse (with a lower priority) to itself (the event
queuing mechanism will queue events in priority order to the IHT, so
if new events come in from IAE they will be put to the front of the
queue) once the interrupt has been unmasked.

Rennie,

When I wrote programatically I meant some care in software must be
taken. I was actually thinking of more unexpensive priority boosting in
a way as Colin advised later, but I was not sure if it may solve the
problem. That’s why I’ve chosen term “programmatically boost”. Your
input below clearly shows it won’t and your idea is much practical and
seems to be quite reasonable. But what I meant - this is still some
software trick - none of the standard drivers use this. There is no any
reference of existance of this problem in documentation… I spent some
time to analyse what’s wrong with QNX and why I don’t see anything
similar in Windows or Linux. Perhaps you had your solution ready and it
came in no time. But this problem isn’t obvious for many people and your
method to avoid the problem is worth a lot. And I still think it may (or
must) be wrapped into some library functions like InterruptAttachEvent()
and InterruptWait()… may be it should pair with InterruptWaitEvent()
and some function for “critical section”.

Boosting interrupt hadling thread’s priority by passing argument
to driver may not be a good solution, because it looks more
logically to unmask interrupt ASAP and do the rest of job processing
the data with lower priority (although if rumours are true and
network drivers unmask interrupt after all data have been processed
this really doesn’t make difference, it’s just bad design).

Some network hardware makes this unavoidable. The hardware will not
de-assert the IRQ until all data have been processed.

Though I worked with cheapest hardware ever possible during all my
engineering practice (esp. here, in Canada ), I’ve never seen such
network hardware. Usually it de-asserts interrupt request as soon as
software clears all (unmasked) bits in interrupt status register (by
writting “1” to corresponding bits of status register). After saving
status register for future references and clearing it, it may take a
long time to read the ring buffer and during this time NIC may assert
IRQ once again because some error occured, some new packet arrived or it
finished the DMA transaction. I think I was just lucky enough to avoid
so badly designed hardware.

Eduard.

Evan_Hillas1 · June 11, 2005, 4:03am

ed1k wrote:

In article <> 42A9BEC3.9040203@csical.com> >, > rallen@csical.com > says…
Some network hardware makes this unavoidable. The hardware will not
de-assert the IRQ until all data have been processed.

Though I worked with cheapest hardware ever possible during all my
engineering practice (esp. here, in Canada > > ), I’ve never seen such
network hardware.

It used to be a possibility in ISA days with the assumption of edge trigger and each device having to have it’s own IRQ. Now, though, with PCI and co. there is the assumption of level trigger and along with that, as you have pointed out, comes device based masking.

If the device can’t be masked then the IRQ can still be shared by performing all the servicing inside the ISR. This, obviously, is best kept to low bandwidth devices.

Evan

ed1k2 · June 11, 2005, 4:24am

In article <d8c1ip$rm0$1@inn.qnx.com>, Wojtek_L@yahoo.ca says…
[…]

Don’t Windows and Linux allow threads to mask an interrupt in a way similar
to InterruptMask() in QNX? If they do, then they have the same problem.

If you’re asking about user land, I believe no. Though I’m unsure, never
been there for long time.

The problem is not that the ISR masks an interrupt; the problem is that
you’re relying on a thread to unmask it.

No, the problem is that a thread doesn’t have maximal priority untill
unmask interrupt. Actually, neither Windows nor Linux doesn’t provide
such mechanism as InterruptAttachEvent(). If my hardware is interrupt
driven I have to provide ISR (there is no “default” one). What Win32
Driver Model requires I have to check in ISR if it’s my interrupt and if
it isn’t return immidiately. If it is my interrupt I have to do minimal
job to de-assert interrupt request on line and schedule DPC object(s)
for deffered processing. DPC routine can’t be pre-empted by any thread,
but hardware interrupts can preempt it. In this model I don’t ever see
any needs to play with interrupt mask.

A thread that calls
InterruptMask() and InterruptUnmask() can get pre-empted between those
calls, too.

Well, if some program in system will call InterruptMask() for no reason
I would consider that piece of software a virus which has to be removed
promptly.

The main difference is that InterruptAttachEvent() may let it
happen more often (sometimes a lot more often), because it makes it possible
for your interrupt to get masked while a high-priority thread is already
running. Whether that’s a big difference depends on whether you can live
with it happening occasionally; but if it can cause people to die when it
happens, it’s probably better if it happens every five seconds rather than
once in five months, because this way your product won’t get shipped until
it’s fixed.

I was not saying QNX sucks and Windows or Linux rocks. If you got it in
that way, that was not my intention. Personally, I hate Microsoft. They
invented Windows data types, they use fault prone techniques like
“typedef char* pchar”, I’m tired typing in their nice named
identificators, and they sometimes return ERROR_SUCCES (english is not
my first language, perhaps this is why I had to read the docu what the
heck they meant by that error). Finally, they have “mutant” instead of
mutex, if you’re familiar with NT internals.

In Linux, they very often use “big kernel lock” and _cli() (whatever
that means).

In QNX, some people just recently were annoying - their system with QNX
native IPC worked well while they tested server and client on a single
node. As they tryed to distribute the system they got a priority
inversion. Problem was solved today by new 6.3 option “priority=” to the
network drivers. Magicaly this driver option affect behaviour of io-net
subsystem (but doesn’t help to fight priority inversion when using npm-
qnet-compat.so for some reason). (Long discussion available at
qnx.org.ru/forum, in russian, of course).

Cheers,
Eduard.

Evan_Hillas1 · June 11, 2005, 4:35am

Rennie Allen wrote:

Queuing a pulse allows the IHT to
re-visit MsgReceive, and handle more IAE events before undertaking
lower priority processing. Granted, I’m not sure what the practical
issues are here,

The typical application is to run through the buffer until it’s empty, and that includes any new data that arrives while the buffer is being emptied. There is no need for further interrupt notifications until after the handler has retired.

That is the period when the device should be masked, and prolly is in most cases, but the IRQ is also being left masked by the naughty drivers. This prevents any other driver ISR from getting it’s due. The current situation is pretty much equivalent to leaving interrupts disabled for large time periods.

Evan

Evan_Hillas1 · June 11, 2005, 5:00am

Evan Hillas wrote:

This prevents any other driver ISR from getting it’s due.

Correction: This prevents all other drivers on that IRQ from getting their due. There is no possibility of priorities, period.

Evan

Armin_Steinhoff1 · June 11, 2005, 2:22pm

Evan Hillas wrote:

Evan Hillas wrote:

This prevents any other driver ISR from getting it’s due.

Correction: This prevents all other drivers on that IRQ from getting
their due. There is no possibility of priorities, period.

The sequence of attaching the interrupt can define a priority.

(see Flag Description _NTO_INTR_FLAGS_END and InterruptAttach)

–Armin

Evan_Hillas1 · June 11, 2005, 3:06pm

Armin Steinhoff wrote:

Evan Hillas wrote:
Correction: This prevents all other drivers on that IRQ from getting
their due. There is no possibility of priorities, period.

The sequence of attaching the interrupt can define a priority.

Read what is being said! Such facilities as thread priorities do not get triggered until after the interrupt is received by the CPU.

Masking an IRQ until the required thread gets around to unmasking it means that all other threads, and ISRs too, that depend on that IRQ are dead in the water for the duration simply because the CPU never gets see further interrupts while the mask is active.

It’s a game of cooperation between all the drivers. InterruptDisable()/InterruptEnable() is the same game, just on a CPU wide level.

Evan

Armin_Steinhoff1 · June 11, 2005, 5:11pm

Evan Hillas wrote:

Armin Steinhoff wrote:

Evan Hillas wrote:

Correction: This prevents all other drivers on that IRQ from getting
their due. There is no possibility of priorities, period.

The sequence of attaching the interrupt can define a priority.

Read what is being said! Such facilities as thread priorities do not
get triggered until after the interrupt is received by the CPU.

Masking an IRQ until the required thread gets around to unmasking it

Wrong … that’s only the case if you use InterruptAttachEvent()
If you use InterruptAttach() you have simply to handle your interrupting
hardware interface and return an event if applicable. After returning
from the ISR the interrupt is automatically unmasked.

If you minimize the executed code within the ISR … the produced latency
for the shared interrupt will remain minimal. The serial driver from
QSSL is a good ‘bad example’ in that sense

Also the ‘priority’ in handling of a shared interrupt depends on its
position of the chain of attached ISR … I would always place the
handler of time critical device at the beginning of that chain!

means that all other threads, and ISRs too, that depend on that IRQ are
dead in the water for the duration simply because the CPU never gets see
further interrupts while the mask is active.

Disabling or masking of shared interrupts is always bad for event or
interrupt driven processing.

It’s a game of cooperation between all the drivers.

Yes, all ISR’s of all devices sharing the same IRQ must very well
cooperating. This e.g. very critical for PowerPC systems!

InterruptDisable()/InterruptEnable() is the same game, just on a CPU
wide level.

The use of these calls should not be allowed in a real ‘real-time’
environment

Evan

Evan_Hillas1 · June 11, 2005, 10:24pm

Okay, good. I like it. And you’ve pointed to a simple example of how bad the situation is. The only reason that people don’t generally complain about their comports messing up is because BIOSes will usually put the comports on their own separate IRQ for DOS compatibility.

What I’ve been trying to say all along is that most if not all of QSS’s drivers have the same bad behaviour of using InterruptAttachEvent(). So, since getting a system that has individual IRQ for every interrupt source is very unlikely, the use of QSS supplied drivers needs to have a big warning attached … DONT PANIC!, Mostly Works.

Evan

Evan_Hillas1 · June 11, 2005, 11:31pm

DONT PANIC!
Mostly Works

Evan_Hillas1 · June 11, 2005, 11:33pm

DONT PANIC!
MostlyWorks

Wojtek_Lerch1 · June 12, 2005, 3:19pm

“ed1k” <ed1k@fake.address> wrote in message
news:MPG.1d143180f3fd3df79896ce@inn.qnx.com…

In article <d8c1ip$rm0$> 1@inn.qnx.com> >, > Wojtek_L@yahoo.ca > says…
[…]
Don’t Windows and Linux allow threads to mask an interrupt in a way
similar
to InterruptMask() in QNX? If they do, then they have the same problem.

If you’re asking about user land, I believe no. Though I’m unsure, never
been there for long time.

No, I was asking about driverland… My point was that the problem can
happen in any OS that makes it possible for a driver to mask an interrupt
and then get pre-empted before unmasking it, which I imagine is a wider
category than the OSes that have a function similar to
InterruptAttachEvent().

The problem is not that the ISR masks an interrupt; the problem is that
you’re relying on a thread to unmask it.

No, the problem is that a thread doesn’t have maximal priority untill
unmask interrupt.

OK, the problem is that you’re relying on a thread to unmask the interrupt,
and that the priorities of threads in your system have been assigned in a
way that interferes with the unmasking. It’s not very surprising that you
can mess up a realtime system by assigning priorities incorrectly, is it…

…

A thread that calls
InterruptMask() and InterruptUnmask() can get pre-empted between those
calls, too.

Well, if some program in system will call InterruptMask() for no reason
I would consider that piece of software a virus which has to be removed
promptly.

Sure; but you’re not saying that no program can possibly have a good reason
to ever call InterruptMask() from a thread, are you?

If a thread in a driver calls InterruptMask() (presumably, for a good
reason), then it has to deal with exactly the same issues as
InterruptAttachEvent() creates. It doesn’t make a big difference whether
the interrupt was masked by the thread or by an ISR, and it doesn’t matter
whether the ISR was the private one that InterruptAttachEvent() uses or a
user-written one that calls InterruptMask() and then returns an event. As
long as it’s a thread’s responsibility to unmask it, the unmasking is
affected by any higher-priority threads and may affect any drivers using the
same interrupt, and that can cause problems on systems that don’t give you
enough control over thread priorities and interrupt assignments. On the
other hand, it may happen to be the most efficient way to handle some
hardware on systems where you can fine tune all the details. My point is
that talking about it as the problem were specific to InterruptAttachEvent()
may give some people the impression that if their driver doesn’t use
InterruptAttachEvent(), then their driver is guaranteed not to have the
problem.

Wojtek_Lerch1 · June 12, 2005, 3:31pm

“Evan Hillas” <evanh@clear.net.nz> wrote in message
news:d8dd5p$s65$1@inn.qnx.com…

On that note, what is the mechanics of InterruptLock()/InterruptUnLock()?

I can guess:

Instead of guessing, you can look at the header it’s defined in. It’s
inline assembly.

When an ISR calls IL() and something else already has the
lock then execution is passed directly to that something else until the
lock is released with a call to IUL(). Then execution of the ISR
continues immediately.

My understanding is that IL() disables all interrupts and IUL() enables
them. The only way for an ISR to call IL() while something else already has
the lock is when the something else is running on a different processor, in
which case IL() simply spins with interrupts disabled until the other
processor calls IUL().

Are interrupts disabled during a lock?

Yes (but I only looked at the x86 implementation).

ed1k2 · June 13, 2005, 4:57am

In article <d8hj6g$t49$1@inn.qnx.com>, Wojtek_L@yahoo.ca says…

“ed1k” <> ed1k@fake.address> > wrote in message
news:> MPG.1d143180f3fd3df79896ce@inn.qnx.com> …
In article <d8c1ip$rm0$> 1@inn.qnx.com> >, > Wojtek_L@yahoo.ca > says…
[…]
Don’t Windows and Linux allow threads to mask an interrupt in a way
similar
to InterruptMask() in QNX? If they do, then they have the same problem.

If you’re asking about user land, I believe no. Though I’m unsure, never
been there for long time.

No, I was asking about driverland… My point was that the problem can
happen in any OS that makes it possible for a driver to mask an interrupt
and then get pre-empted before unmasking it, which I imagine is a wider
category than the OSes that have a function similar to
InterruptAttachEvent().

Well, then it’s rather complex question. As you know, every OS provide
its own model for drivers and interrupt handling. Usually these models
do not project on the underlying hardware (PIC) so straight as QNX model
does. Windows NT doesn’t have any function that impilicitly masks
particular interrupt line, AFAIK. What NT driver can do it’s to rise its
IRQL (IRQ level) effectively disabling all hardware IRQ below that IRQL.
In NT, when ISR is running it might be preempted by any higher level
ISR, and it may schedule DPC for futher processing. DPCs will run in
FIFO order at IRQL below any hardware interrupt but above any thread
in system (it is actually software interrupt or exeption in MS
terminology). DPC receives/sends data from/to memory buffer/device, but
it’s not the last instance for all data processing or finishing IRP,
rest of job will be done at lower IRQL where we can talk about threads.
There is synchronisation function which allows to change IRQL and so
called “critical sections” to avoid simultaneosly accessing to the
hardware - I don’t know how they are close to masking interrupts in the
PIC… They deal with spinlock for particular IRQ, I guess it’s property
of the kernel. In other words, if NT driver for some reason rises its
IRQL (masking all IRQLs of that level and below) it can be pre-empted
only by higher level IRQ. There is no possibility for discussed problem,
though I am not discussing real-time capabilities of NT or acceptance of
such design for RTOS. NT is not RTOS.

There are disable_irq() and enable_irq() functions in Linux kernel which
seems to be close to InterruptMask/InterruptUnmask() in QNX. Though it’s
not common practice in Linux kernel, you probably may write a
minimalistic ISR that masks irq line and schedules tasklet for futher
processing. Tasklet can’t be pre-empted by any thread. So, situation
similar to InterruptAttachEvent() is impossible. But I agree, if some
kernel thread working on behalf of some user’s process calls disable_irq
() it can be pre-empted by high priority thread, therefore unmasking
latency is at the mercy of that high priority thread. Though I have to
say pre-emption in linux kernel showed up only recently and it is not
always working Additionally they usually use cli/sti() around
critical sections - no interrupts, no pre-emption Linux is not RTOS,
is it?

OK, the problem is that you’re relying on a thread to unmask the interrupt,
and that the priorities of threads in your system have been assigned in a
way that interferes with the unmasking. It’s not very surprising that you
can mess up a realtime system by assigning priorities incorrectly, is it…

No, it’s not a surprise. But opening up a box with RTOS, installing
thing on a computer and using only components from box with their
“default” priorities may I hope I have RTOS and it’s not already messed
up?

Sure; but you’re not saying that no program can possibly have a good reason
to ever call InterruptMask() from a thread, are you?

I am not sure Probably not, I’m not saying. Thought I can’t think of
a good example.

BTW, disable_irq(int no) in linux waits for currently executing ISR
handler if any to complete. QNX documentation doen’t mention such
behaviour, so I assume InterruptMask() doesn’t wait. So, it’s a really
bad wrapper for critical section… and I honestly would be thankful for
example of good usage of InterruptMask() in a thread.

If a thread in a driver calls InterruptMask() (presumably, for a good
reason), then it has to deal with exactly the same issues as
InterruptAttachEvent() creates. It doesn’t make a big difference whether
the interrupt was masked by the thread or by an ISR, and it doesn’t matter
whether the ISR was the private one that InterruptAttachEvent() uses or a
user-written one that calls InterruptMask() and then returns an event. As
long as it’s a thread’s responsibility to unmask it, the unmasking is
affected by any higher-priority threads and may affect any drivers using the
same interrupt, and that can cause problems on systems that don’t give you
enough control over thread priorities and interrupt assignments. On the
other hand, it may happen to be the most efficient way to handle some
hardware on systems where you can fine tune all the details. My point is
that talking about it as the problem were specific to InterruptAttachEvent()
may give some people the impression that if their driver doesn’t use
InterruptAttachEvent(), then their driver is guaranteed not to have the
problem.

Ok, I got you. On the other hand, knowing about the problem it is
possible to use InterruptAttachEvent() or InterruptMask/InterruptUnmask
and avoid damage of the problem.

Eduard.