hardware interrupt IRQ conflict

ed1k <ed1k@fake.address> wrote:

In article <d8pjf9$sl6$> 1@inn.qnx.com> >, > dagibbs@qnx.com > says…

I don’t know if there is any code to check whether or not this
could happen.

Checked with kernel person – there is no code to prevent this.

So, it could happen in SMP environment. Good to know, thanks. Do you (or
kernel person rather) have any plans to make InterruptMask() SMP safe?

There are no plans to do any work to make it avoid that race condition.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

I’ve had a quick look through our drivers and there is already a
“priority” option available on many (although admittedly not all) of the
drivers (try, for example, “use /lib/dll/devn-i82544.so”). Some of the
older drivers don’t currently support this option.

For a point of interest, InterruptAttachEvent is very useful for
customers which require high reliability in the face of the “broken”
hardware scenario. IAE allows hardware to continuously assert an
interrupt without resulting in the operating system being brought down.

Requirements differ for different people :slight_smile:.

Maybe I’m being a stickler for wording here, but network drivers that
use IAE don’t really “lack support” for sharing so much as they may
“introduce an unacceptably long delay” in the interrupt handling for
shared interrupts. Again, much depends upon your requirements and how
you configure your system.

Robert Craig



Evan Hillas wrote:

Rennie Allen wrote:

Evan Hillas wrote:

The two methods already available are fine, just the question of the
default driver behaviour is my concern.



The fact that they use IAE, or the fact that they don’t have 2
(configurable) IHT processing priorities ? I can agree with the latter.
The former implies more interrupt latency than the OS should add.


By default I mean the lack of support for sharing.


Evan

Robert Craig <rcraig_at_qnx@nowhere.com> wrote:

I’ve had a quick look through our drivers and there is already a
“priority” option available on many (although admittedly not all) of the
drivers (try, for example, “use /lib/dll/devn-i82544.so”). Some of the
older drivers don’t currently support this option.

And io-char drivers (devc-*) don’t generally support it, either.
(Though, I recently suggest to the developper that owns it about
a way to add support for a priority option, that we both think will
work. There’s some quirks in the driver/io-char interface for parsing
“standard” option that make it a bit tricky.)

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Robert Craig wrote:

I’ve had a quick look through our drivers and there is already a
“priority” option available on many (although admittedly not all) of the
drivers (try, for example, “use /lib/dll/devn-i82544.so”). Some of the
older drivers don’t currently support this option.

For a point of interest, InterruptAttachEvent is very useful for
customers which require high reliability in the face of the “broken”
hardware scenario. IAE allows hardware to continuously assert an
interrupt without resulting in the operating system being brought down.

That kind of security is also possible without IAE … and it can be
handled more customer specific.

–Armin

Robert Craig wrote:

I’ve had a quick look through our drivers and there is already a
“priority” option available on many (although admittedly not all) of the

Masking the IRQ prevents any consideration of new priorities pertaining to that IRQ. Priority was never the issue, lack of further interrupts on that IRQ is the issue.


Maybe I’m being a stickler for wording here, but network drivers that
use IAE don’t really “lack support” for sharing so much as they may
“introduce an unacceptably long delay” in the interrupt handling for
shared interrupts. Again, much depends upon your requirements and how

IAE() does lack the support, IRQ masking and IRQ sharing are incompatible. The “long delay” in network drivers simply brings the problem right into your face because the “other” driver goes splat so damn quickly.


Evan

“Evan Hillas” <evanh@clear.net.nz> wrote in message
news:d8sqkv$ckm$1@inn.qnx.com

Robert Craig wrote:
I’ve had a quick look through our drivers and there is already a
“priority” option available on many (although admittedly not all) of the

Masking the IRQ prevents any consideration of new priorities pertaining to
that IRQ. Priority was never the issue, lack of further interrupts on
that IRQ is the issue.

But the lack of further interrupts is caused by higher-priority threads,
perhaps belonging to some application or a completely unrelated driver,
interferring with the unmasking, isn’t it?

Or are you complaining about some drivers that use IAE() and spend too much
time doing unnecessary things before unmasking the interrupt? That’s, of
course, a different issue, and it’s not really very specific to IAE() – it
doesn’t differ much from having an ISR that spends too much time doing
unnecessary things before returning, except that the ISR problem may be
easier to notice sooner, because it affects all lower-priority interrupts in
addition to its own.

Wojtek Lerch wrote:

But the lack of further interrupts is caused by higher-priority threads,
perhaps belonging to some application or a completely unrelated driver,
interferring with the unmasking, isn’t it?

Or are you complaining about some drivers that use IAE() and spend too much
time doing unnecessary things before unmasking the interrupt? That’s, of
course, a different issue, and it’s not really very specific to IAE() – it
doesn’t differ much from having an ISR that spends too much time doing
unnecessary things before returning, except that the ISR problem may be
easier to notice sooner, because it affects all lower-priority interrupts in
addition to its own.

Yes, all of the above issues are what’s wrong with IAE().

To fix it; the replacement of the call to IAE() with an equivalent ISR that masks the device and sends an event to the waiting handler thread is all that is needed.

A complex driver can spend as much time as it likes processing it’s data and can be preempted by higher priority and whatever else. That’s the very reason to pass the hard work off to the thread level. That’s all fine and dandy. No problems there.

It’s not the duration of a driver’s activity that is the fundamental flaw. It’s just the IRQ masking, which is completely unneeded in most cases.


Evan

Evan Hillas wrote:

Yes, all of the above issues are what’s wrong with IAE().

To fix it; the replacement of the call to IAE() with an equivalent ISR
that masks the device and sends an event to the waiting handler thread
is all that is needed.

Let me emphasize that this is to allow IRQ sharing. IAE() works just fine if non-sharing is ensured.


Evan

Wojtek Lerch wrote:

“Evan Hillas” <> evanh@clear.net.nz> > wrote in message
news:d8sqkv$ckm$> 1@inn.qnx.com> …

Robert Craig wrote:

I’ve had a quick look through our drivers and there is already a
“priority” option available on many (although admittedly not all) of the

Masking the IRQ prevents any consideration of new priorities pertaining to
that IRQ. Priority was never the issue, lack of further interrupts on
that IRQ is the issue.


But the lack of further interrupts is caused by higher-priority threads,
perhaps belonging to some application or a completely unrelated driver,
interferring with the unmasking, isn’t it?

Yes … and that is a very specific problem of IAE and its interrupt
thread!!

Or are you complaining about some drivers that use IAE() and spend too much
time doing unnecessary things before unmasking the interrupt?

If that thread gets the CPU at all :slight_smile:

That’s, of
course, a different issue, and it’s not really very specific to IAE() – it
doesn’t differ much from having an ISR that spends too much time doing
unnecessary things before returning,

But the ISR can’t be preempted by a simple application thread. So it’s
up to the ISR code to optimize the interrupt latency.

–Armin

Just to let everyone know, I’ve submitted a PR (25810) to analyze
interrupt sharing / use of InterruptAttachEvent in our network drivers.
Replacing the IAE with an actual ISR may not be possible for all
hardware (especially older variants).

R.


Evan Hillas wrote:

Evan Hillas wrote:

Yes, all of the above issues are what’s wrong with IAE().

To fix it; the replacement of the call to IAE() with an equivalent ISR
that masks the device and sends an event to the waiting handler thread
is all that is needed.


Let me emphasize that this is to allow IRQ sharing. IAE() works just
fine if non-sharing is ensured.


Evan

“Evan Hillas” <evanh@clear.net.nz> wrote in message
news:d8ts3s$4tp$1@inn.qnx.com

Wojtek Lerch wrote:
But the lack of further interrupts is caused by higher-priority threads,
perhaps belonging to some application or a completely unrelated driver,
interferring with the unmasking, isn’t it?

Or are you complaining about some drivers that use IAE() and spend too
much time doing unnecessary things before unmasking the interrupt?
That’s, of course, a different issue, and it’s not really very specific
to IAE() – it doesn’t differ much from having an ISR that spends too
much time doing unnecessary things before returning, except that the ISR
problem may be easier to notice sooner, because it affects all
lower-priority interrupts in addition to its own.

Yes, all of the above issues are what’s wrong with IAE().

How come? The first one is just one of many possible symptoms of setting up
thread priorities incorrectly. The second and third are about badly
designed drivers that are doing unnecessary things before allowing
interrupts to happen again. The third one actually is about an ISR – how
can it be a problem of IAE()?..

To fix it; the replacement of the call to IAE() with an equivalent ISR
that masks the device and sends an event to the waiting handler thread is
all that is needed.

Or leave the IAE() in and make sure that your thread masks the device and
calls InterruptUnmask() as quickly as possible, and that its priority is
high enough. This will mask the interrupt for a bit longer than your fix
does, but yours masks all the lower-priority interrupts for longer, because
the kernel needs to switch the address space and run your ISR before issuing
the EOI. Which way is better depends on what devices, if any, you’re
sharing the interrupt with, and what devices are attached to lower priority
interrupts. And how good you are at picking thread priorities correctly.

A complex driver can spend as much time as it likes processing it’s data
and can be preempted by higher priority and whatever else. That’s the
very reason to pass the hard work off to the thread level. That’s all
fine and dandy. No problems there.

Not exactly: if it’s pre-empted for too long, you’ll still lose incoming
data, or fail to stop the vehicle before it hits the wall, or whatever.
Just because you’re not using IAE() doesn’t mean that you don’t have to
worry about setting up thread priorities correctly; and if you do set up
thread priorities correctly, IAE() is not a problem.

Yes, setting priorities correctly can be more difficult when IAE() or IM()
are involved, and may make it easier for you to shoot yourself in the foot.
But that’s a simple consequence of having more variables to account for, not
a fundamental problem of IAE(). Having more variables to play with can
often be a good thing.

It’s not the duration of a driver’s activity that is the fundamental flaw.
It’s just the IRQ masking, which is completely unneeded in most cases.

No, it’s the combination of both. There’s no way to completely get rid of
masking – you actually want it to happen in the hardware when the
interrupt is received. The problem is the duration of the masking and how
much control you have over it.

Robert Craig wrote:

Just to let everyone know, I’ve submitted a PR (25810) to analyze
interrupt sharing / use of InterruptAttachEvent in our network drivers.
Replacing the IAE with an actual ISR may not be possible for all
hardware (especially older variants).

Please implement this as IAE with a selectable priority event for stage
one processing, and a selectable priority event for stage two
processing. Stage 1 processing being that processing necessary to get
the card to de-assert IRQ (at which point the driver unmasks the
interrupt) and stage 2 being actual servicing of the event. This way
the application developer maintains complete control of everything
(they are free to shoot themselves in the foot and emulate a long
interrupt latency if they so desire, but there is no requirement that
they do this - as there is with IA).

There is no downside to this implementation. If you are not sharing
interrupts with another device, then feel free to set both stage 1&2
processing to as low a priority as you wish; if you are sharing
interrupts with another device then set stage 1 processing to prio
255 and it will behave exactly as if it was IA (the fact that the
shared interrupt is masked for a few hundred nanoseconds extra for a
device that is sharing the IRQ, is a small price to pay in exchange
for the ability to reduce overall interrupt latency - after all
network cards - and the cards that share their IRQs are seldom the
most time critical piece of hardware in a real-time system).

Increasing overall OS interrupt latency to optimize the degenerate
case of a very time critical interrupt, that just happens to share the
IRQ with a brain dead network card is a clear case of the tail wagging
the dog.

If you choose not to implement it this way, then you might as well get
prepared to file a PR for undue interrupt latency in network drivers :frowning:

“Armin Steinhoff” <a-steinhoff@web.de> wrote in message
news:d8u5ka$cqf$1@inn.qnx.com

Wojtek Lerch wrote:

But the lack of further interrupts is caused by higher-priority threads,
perhaps belonging to some application or a completely unrelated driver,
interferring with the unmasking, isn’t it?

Yes … and that is a very specific problem of IAE and its interrupt
thread!!

No, it’s a problem of having high-priority threads pre-empting a thread
responsible for unmasking an interrupt. It doesn’t make much of a
difference whether the interrupt was masked by IAE(), or by an
InterruptMask() call in an ISR or in the thread. And the problem does not
happen if you select the priorities of all threads correctly. And switching
the from IAE to IA doesn’t necessarily solve the problem, because if the
high-priority threads pre-empt the driver for too long, there’s a good
chance that you’ll end up losing some data anyway.

Or are you complaining about some drivers that use IAE() and spend too
much time doing unnecessary things before unmasking the interrupt?

If that thread gets the CPU at all > :slight_smile:

So you’re basically saying that it’s wrong for a realtime OS to allow you
to decide that there’s something in your system that needs to be done so
urgently, and can be done so quickly, that you want to let it pre-empt the
handling of a carefully chosen set of interrupts? And the only reason you
think it should be forbidden is becasue it allows you to mess up your system
by assigning the priorities incorrectly?

That’s, of course, a different issue, and it’s not really very specific
to IAE() – it doesn’t differ much from having an ISR that spends too
much time doing unnecessary things before returning,

But the ISR can’t be preempted by a simple application thread. So it’s up
to the ISR code to optimize the interrupt latency.

Not just any application thread – only an application thread whose priority
was set at least as high as the driver’s interrupt thread.

Whether allowing that is a good thing or a bad thing depends on the point of
view, I guess. Sometimes it may be a good thing when the OS polices the
priorities of things, and forces all the application threads to run below
all the driver threads, and all the interrupt handling above all the
threads, to minimize the damage when the priorities are set up wrong. Other
times, it may be useful to let the developer decide what should be able to
pre-empt what in his system, even if that also means giving him more
opportunities to make a mistake. It appears that QNX is more pro-choice on
this than you are. :wink:

This is why the PR was worded “analyze” as opposed to “replace IAE with
IA” :slight_smile:. Using the same interrupt servicing mechanism for all cards
probably isn’t the best way to go. Things certainly aren’t as simple as
“replace IAE with IA” given that different hardware may have vastly
different servicing requirements. I’ve include this suggestion in the PR.

Thanks!
Robert.




Rennie Allen wrote:

Robert Craig wrote:

Just to let everyone know, I’ve submitted a PR (25810) to analyze
interrupt sharing / use of InterruptAttachEvent in our network
drivers. Replacing the IAE with an actual ISR may not be possible
for all hardware (especially older variants).


Please implement this as IAE with a selectable priority event for stage
one processing, and a selectable priority event for stage two
processing. Stage 1 processing being that processing necessary to get
the card to de-assert IRQ (at which point the driver unmasks the
interrupt) and stage 2 being actual servicing of the event. This way
the application developer maintains complete control of everything
(they are free to shoot themselves in the foot and emulate a long
interrupt latency if they so desire, but there is no requirement that
they do this - as there is with IA).

There is no downside to this implementation. If you are not sharing
interrupts with another device, then feel free to set both stage 1&2
processing to as low a priority as you wish; if you are sharing
interrupts with another device then set stage 1 processing to prio
255 and it will behave exactly as if it was IA (the fact that the
shared interrupt is masked for a few hundred nanoseconds extra for a
device that is sharing the IRQ, is a small price to pay in exchange
for the ability to reduce overall interrupt latency - after all
network cards - and the cards that share their IRQs are seldom the
most time critical piece of hardware in a real-time system).

Increasing overall OS interrupt latency to optimize the degenerate
case of a very time critical interrupt, that just happens to share the
IRQ with a brain dead network card is a clear case of the tail wagging
the dog.

If you choose not to implement it this way, then you might as well get
prepared to file a PR for undue interrupt latency in network drivers > :frowning:

Wojtek Lerch wrote:

“Evan Hillas” <> evanh@clear.net.nz> > wrote in message
Yes, all of the above issues are what’s wrong with IAE().


How come? The first one is just one of many possible symptoms of setting up
thread priorities incorrectly. The second and third are about badly

The point is that the IRQ is masked during all the examples. And the higher priority thread issue is not solvable except unless your driver is highest and the only one using IAE().

To fix it; the replacement of the call to IAE() with an equivalent ISR
that masks the device and sends an event to the waiting handler thread is
all that is needed.


Or leave the IAE() in and make sure that your thread masks the device and
calls InterruptUnmask() as quickly as possible, and that its priority is

That just adds an extra unnessasary mask.


high enough. This will mask the interrupt for a bit longer than your fix
does, but yours masks all the lower-priority interrupts for longer, because
the kernel needs to switch the address space and run your ISR before issuing

That’s a bit comical given QSS’s view of sharing. But yep, I’m all for having IAE() upgraded so it can do the device masking instead of IRQ masking.

A complex driver can spend as much time as it likes processing it’s data
and can be preempted by higher priority and whatever else. That’s the
very reason to pass the hard work off to the thread level. That’s all
fine and dandy. No problems there.


Not exactly: if it’s pre-empted for too long, you’ll still lose incoming
data, or fail to stop the vehicle before it hits the wall, or whatever.
Just because you’re not using IAE() doesn’t mean that you don’t have to
worry about setting up thread priorities correctly; and if you do set up
thread priorities correctly, IAE() is not a problem.

I was trying to say that the thread level behaviour can do whatever it likes, I don’t care. Once IRQ masking is eliminated from the servicing mechanism then sharing functions cleanly.

It’s not the duration of a driver’s activity that is the fundamental flaw.
It’s just the IRQ masking, which is completely unneeded in most cases.


No, it’s the combination of both. There’s no way to completely get rid of
masking – you actually want it to happen in the hardware when the

I believe you are refering to the “In Service” bit. One would hope that that’s automatically set in the hardware and is naturally cleared as part of the required ioport acceses to the PIC. That has nothing to do with masking at the thread level.


Evan

Rennie Allen wrote:

processing. Stage 1 processing being that processing necessary to get
the card to de-assert IRQ (at which point the driver unmasks the
interrupt) and stage 2 being actual servicing of the event. This way

There is only one sensible priority for stage one, and that’s MAX. It costs more than using IA() but I guess it’ll work … and I’ll go away. :wink:


Evan

Robert Craig wrote:

Just to let everyone know, I’ve submitted a PR (25810) to analyze
interrupt sharing / use of InterruptAttachEvent in our network drivers.

Thanks.

God damn, I actually got some action from a corporation. :wink:


Replacing the IAE with an actual ISR may not be possible for all
hardware (especially older variants).

Then such a device can’t be shared and there is no need to make a change to that driver. And maybe some warning to that effect should be included in it’s help page.


Evan

Wojtek Lerch wrote:

“Armin Steinhoff” <> a-steinhoff@web.de> > wrote in message
news:d8u5ka$cqf$> 1@inn.qnx.com> …

[ clip …]


So you’re basically saying that it’s wrong for a realtime OS to allow you
to decide that there’s something in your system that needs to be done so
urgently, and can be done so quickly, that you want to let it pre-empt the
handling of a carefully chosen set of interrupts? And the only reason you
think it should be forbidden is becasue it allows you to mess up your system
by assigning the priorities incorrectly?

The problem is that there are absolutely no guidlines for assigning the
priorities!

“Priorities of application threads should not be higher than the lowest
priority assigned to an interrupt thread” … could one of theses Guidlines.

That’s, of course, a different issue, and it’s not really very specific
to IAE() – it doesn’t differ much from having an ISR that spends too
much time doing unnecessary things before returning,

But the ISR can’t be preempted by a simple application thread. So it’s up
to the ISR code to optimize the interrupt latency.


Not just any application thread – only an application thread whose priority
was set at least as high as the driver’s interrupt thread.

Whether allowing that is a good thing or a bad thing depends on the point of
view, I guess. Sometimes it may be a good thing when the OS polices the
priorities of things, and forces all the application threads to run below
all the driver threads, and all the interrupt handling above all the
threads, to minimize the damage when the priorities are set up wrong. Other
times, it may be useful to let the developer decide what should be able to
pre-empt what in his system, even if that also means giving him more
opportunities to make a mistake. It appears that QNX is more pro-choice on
this than you are. > :wink:

Choices are OK … but I missing guidelines for deciding how to choose :slight_smile:

–Armin



Evan Hillas wrote:

Rennie Allen wrote:

processing. Stage 1 processing being that processing necessary to get
the card to de-assert IRQ (at which point the driver unmasks the
interrupt) and stage 2 being actual servicing of the event. This way


There is only one sensible priority for stage one, and that’s MAX. It
costs more than using IA() but I guess it’ll work … and I’ll go away. > :wink:

There is two main points together that make this usable: Maximum priority boost and short execution path to returning of priority.

I note there is two recommended methods of implementing the handler thread. With SIGEV_INTR + InterruptWait() and SIGEV_PULSE + MsgReceivev().


// Method One would go something like this:
//===========================================

ThreadCtl(_NTO_TCTL_IO, 0);
prioritystageone = sched_get_priority_max(); // After ThreadCtl to ensure
// top priority.
InterruptAttachEvent( SIGEV_INTR );

while()
{
set_thread_priority( prioritystageone );
unmask_device(); // Must be inside the top priority stage to prevent
// an extended IRQ masking.
InterruptWait();
mask_device();
InterruptUnmask();
set_thread_priority( prioritystagetwo ); // Preferred priority of the
// handler.
do_the_device_level_work();
}

InterruptDetach();

//===========================================


Method Two is beyond my experience. It looks quite doable with an immediate test and action on the received pulse but I have no idea what happens when a high priority pulse is sent and the thread isn’t waiting to process that pulse in short order.


Cheers,
Evan

I forgot the pending check …

// Method One would go something like this:
//===========================================

ThreadCtl(_NTO_TCTL_IO, 0);
prioritystageone = sched_get_priority_max(); // After ThreadCtl to ensure
// top priority.
InterruptAttachEvent( SIGEV_INTR );

while()
{
set_thread_priority( prioritystageone );
unmask_device(); // Must be inside the top priority stage to prevent
// an extended IRQ masking.
while( is_device_pending() )
{
InterruptWait();
}
mask_device();
InterruptUnmask();
set_thread_priority( prioritystagetwo ); // Preferred priority of the
// handler.
do_the_device_level_work();
}

InterruptDetach();

//===========================================