hardware interrupt IRQ conflict

I have hardware interrupt IRQ conflict on a QNX 6.3 target. When I’m doing
the command pci -v |grep Interrrupt,
I see 3 time the interrupt 11.

Because I’m running a real time target, I don’t want interrupt management.
That could increase my calculation time.
Sometime, I have interrupt conflict on my Ethernet card and this is very
bad.

The only solution up to date is to disable the other hardware in the BIOS,
but it is not very good because
I loose some hardware (USB, Sound card)

How I can manage the IQR attribution in the QNX boot system, to put the good
PCI IRQ that I want, and resolve the conflict.


Marc-Andre Bouchard
Opal-Rt technologie
Field Application Engineer

There is nothing that you can do on X86 systems, as the BIOS assigns the
IRQs.

“Marc-Andre Bouchard” <marc-andre.bouchard@opal-rt.com> wrote in message
news:d8716f$4so$1@inn.qnx.com

I have hardware interrupt IRQ conflict on a QNX 6.3 target. When I’m doing
the command pci -v |grep Interrrupt,
I see 3 time the interrupt 11.

Because I’m running a real time target, I don’t want interrupt management.
That could increase my calculation time.
Sometime, I have interrupt conflict on my Ethernet card and this is very
bad.

The only solution up to date is to disable the other hardware in the BIOS,
but it is not very good because
I loose some hardware (USB, Sound card)

How I can manage the IQR attribution in the QNX boot system, to put the
good
PCI IRQ that I want, and resolve the conflict.


Marc-Andre Bouchard
Opal-Rt technologie
Field Application Engineer

Sometimes putting the card in a different slot will change the
irq it uses.

-seanb

Marc-Andre Bouchard <marc-andre.bouchard@opal-rt.com> wrote:

I have hardware interrupt IRQ conflict on a QNX 6.3 target. When I’m doing
the command pci -v |grep Interrrupt,
I see 3 time the interrupt 11.

Because I’m running a real time target, I don’t want interrupt management.
That could increase my calculation time.
Sometime, I have interrupt conflict on my Ethernet card and this is very
bad.

The only solution up to date is to disable the other hardware in the BIOS,
but it is not very good because
I loose some hardware (USB, Sound card)

How I can manage the IQR attribution in the QNX boot system, to put the good
PCI IRQ that I want, and resolve the conflict.



Marc-Andre Bouchard
Opal-Rt technologie
Field Application Engineer

The QSSL supplied network drivers are the worst. If you can avoid being on the same IRQ as that then you’ll be a lot better off. That said, I am of the opinion that, all QSSL drivers are explicitly written to break when sharing an IRQ. This provides a small performance bonus of one less hardware access per driver per interrupt.

The USB driver may be the exception from what I’ve heard. Prolly because of the large number of USB interrupt sources that would always cause dropped interrupts if sharing wasn’t coded for.

For some adventurous reading … http://www.openqnx.com/PNphpBB2+viewtopic-t-1798.html


Evan

In article <d87pc4$lgu$1@inn.qnx.com>, evanh@clear.net.nz says…

The QSSL supplied network drivers are the worst. If you can avoid being on the same IRQ as that then you’ll be a lot better off. That said, I am of the opinion that, all QSSL drivers are explicitly written to break when sharing an IRQ. This provides a small performance bonus of one less hardware access per driver per interrupt.

The USB driver may be the exception from what I’ve heard. Prolly because of the large number of USB interrupt sources that would always cause dropped interrupts if sharing wasn’t coded for.

For some adventurous reading … > http://www.openqnx.com/PNphpBB2+viewtopic-t-1798.html

That’s a nice discussion. I have not read all 6 pages yet (too hot here
these days). But I certainly will do.

There are two intersting points in this discussion, sorry
I’m going to make a jam:

In article <d878ki$a9q$1@inn.qnx.com>, hsbrown@qnx.com says…

There is nothing that you can do on X86 systems, as the BIOS assigns the
IRQs.

There is nothing that you can do on x86 system running QNX, eh? Windows
2K doesn’t run on anything but x86, AFAIK. I can go to the Device
manager, choice a device and it’s resource tab, and change IRQ. What I’m
doing wrong?
Actually any PnP OS reassigns IRQs and other resources even if it
doesn’t allow user’s intervention in this process.

In article <d879dt$ano$1@inn.qnx.com>, seanb@qnx.com says…

Sometimes putting the card in a different slot will change the
irq it uses.

This is very practical point. Actually this is the best what one can do
building hard real-time on generic hardware with PCI bus.

If for some reason you can’t change slot you can try this utility

http://ed1k.qnx.org.ru/setirq.html

Read carefully help messages, slay all participated drivers for hardware
affected, try it, if it works you can put commands somewhere in sysinit
before enum-devices. If it doesn’t work, sorry - no any warranty it
ought to work :slight_smile:

Eduard.

ed1k wrote:

There is nothing that you can do on x86 system running QNX, eh? Windows
2K doesn’t run on anything but x86, AFAIK. I can go to the Device
manager, choice a device and it’s resource tab, and change IRQ. What I’m
doing wrong?
Actually any PnP OS reassigns IRQs and other resources even if it
doesn’t allow user’s intervention in this process.

But have to be wary of the paired nature of most chipset interrupt lines. An on-board device will be paired with a slot and can not be separated. This along with consistent BIOS assigning IRQs to interrupt lines is why the “shift slots” recommendation works so well.


Evan

Marc-Andre Bouchard wrote:

I have hardware interrupt IRQ conflict on a QNX 6.3 target. When I’m doing
the command pci -v |grep Interrrupt,
I see 3 time the interrupt 11.

Because I’m running a real time target, I don’t want interrupt management.
That could increase my calculation time.
Sometime, I have interrupt conflict on my Ethernet card and this is very
bad.

The only solution up to date is to disable the other hardware in the BIOS,
but it is not very good because
I loose some hardware (USB, Sound card)

How I can manage the IQR attribution in the QNX boot system, to put the good
PCI IRQ that I want, and resolve the conflict.

This is not a ‘conflict’ … interrupt sharing is absolutely normal with
PCI.

You have simply to check in you ISR if the interface of your device is
realy the source of the interrupt. If not return with NULL from the ISR.
This is what all other drivers do … that’s all.

–Armin


Marc-Andre Bouchard
Opal-Rt technologie
Field Application Engineer

Armin Steinhoff wrote:

Marc-Andre Bouchard wrote:
The only solution up to date is to disable the other hardware in the
BIOS, but it is not very good because
I loose some hardware (USB, Sound card)

How I can manage the IQR attribution in the QNX boot system, to put
the good PCI IRQ that I want, and resolve the conflict.


This is not a ‘conflict’ … interrupt sharing is absolutely normal with
PCI.

It is, however, a conflict at the driver level purely because of internal QSSL decision.


You have simply to check in you ISR if the interface of your device is
realy the source of the interrupt. If not return with NULL from the ISR.
This is what all other drivers do … that’s all.

Good point. So there is no performance advantage in the way QSSL have coded their drivers, just a sharing conflict.


Evan

The step required for correct sharing is, inside the driver’s ISR, to unmask the IRQ as soon as your device is identified as the source and internally masked.

Then the driver’s ISR can standard notification of it’s thread level handler to perform the required processing.


Evan

Evan Hillas wrote:

The step required for correct sharing is, inside the driver’s ISR, to
unmask the IRQ as soon as your device is identified as the source and
internally masked.

Correction:
Since Neutrino auto unmasks the IRQ after all ISRs have completed that step is not needed, so the bits needed are:
An IRQ handler (ISR) in the first place. Ie: Not just a notified thread as this method leaves the IRQ masked indefinitely and hence the sharing failures.
1: Identify your device as needing serviced.
2: Perform masking of the interrupt source (in the device itself).
3: Thread notification.

Then the driver’s ISR can standard notification of it’s thread level
handler to perform the required processing.


Evan

Evan Hillas wrote:

Armin Steinhoff wrote:

Marc-Andre Bouchard wrote:

The only solution up to date is to disable the other hardware in the
BIOS, but it is not very good because
I loose some hardware (USB, Sound card)

How I can manage the IQR attribution in the QNX boot system, to put
the good PCI IRQ that I want, and resolve the conflict.



This is not a ‘conflict’ … interrupt sharing is absolutely normal
with PCI.


It is, however, a conflict at the driver level purely because of
internal QSSL decision.

Not at ‘driver level’. It is done at the ISR level and that is more or
less independ from the driver.

–Armin

You have simply to check in you ISR if the interface of your device is
realy the source of the interrupt. If not return with NULL from the ISR.
This is what all other drivers do … that’s all.


Good point. So there is no performance advantage in the way QSSL have
coded their drivers, just a sharing conflict.


Evan

Armin Steinhoff wrote:

Evan Hillas wrote:
Armin Steinhoff wrote:
This is not a ‘conflict’ … interrupt sharing is absolutely normal
with PCI.


It is, however, a conflict at the driver level purely because of
internal QSSL decision.


Not at ‘driver level’. It is done at the ISR level and that is more or
less independ from the driver.

What?! I just finished explaining, clearly I thought, how it is the software design of the “other” shared driver that is causing the sharing problem.

The drivers that seem to be the biggest abuser of the flawed method are the network drivers.


Evan

In article <d8aaaj$jlm$1@inn.qnx.com>, evanh@clear.net.nz says…

Armin Steinhoff wrote:
Evan Hillas wrote:
Armin Steinhoff wrote:
This is not a ‘conflict’ … interrupt sharing is absolutely normal
with PCI.


It is, however, a conflict at the driver level purely because of
internal QSSL decision.


Not at ‘driver level’. It is done at the ISR level and that is more or
less independ from the driver.

What is ‘driver level’ and what is ISR level? I always thought the ISR
is a part of driver (of course, if device serviced by the driver
requires some interrupt handling). How can it be independent?

What?! I just finished explaining, clearly I thought, how it is the software design of the “other” shared driver that is causing the sharing problem.

Here is my few cents to this discussion or why shared interrupts in QNX
is worse than shared interrupts in Linux or Windows (at least while QNX
driver uses InterruptAttachEvent() call to install “default” ISR).

Windows domain.
When ISR running at DIRQL decides it needs some more
processing, but it might be done at the lower priority
level, it put deffered procedure call (DPC) into queue.
DPC will be executed later with DISPATCH_LEVEL priority
that is above any user’s thread in a system, but below
hardware generated interrupts.
Priority levels (excursus to windows terms):
HIGHEST_LEVEL -bus errors and machine checks

DIRQLs -hardware interrupt requests
DISPATH_LEVEL -scheduler and DPC execution

PASSIVE_LEVEL -normal thread execution level

Linux domain.
If ISR decides it needs some more processing, but it
might be done later when interrupts acknowledged and
enabled, it put tasklet into queue to run. (Old kernels used
bottom halves (BH) to do deffered interrupt processing, apparently
BH mechanism in modern Linux kernels is a wrapper to tasklets, just
backward compatibility). It’s pretty close to Windows’ DPC (from view
of priorities and execution context).

QNX domain.
If ISR decides it needs some more processing that could be
done later when interrupts acknowledged and enabled, it returns
event to the thread which is blocked waiting this event. Nice. It’s
easy to understand and easy to program. Now take a look at
InterruptAttachEvent(). It puts “default” ISR handler which masks
interrupt line and returns event to waiting thread. Fine. The
problem lays in a priority of that waiting thread. It may be a way
too low and this thread would be preempted by more “high priority”
task effectively leaving hardware interrupt line deaf to the
requests. And this is a difference from two models described above.
Actually, it’s easy to reduce the gap by boosting priority of
the interrupt waiting thread to max and return it back to normal
as soon as the thread issues UnmaskInterrupt(). I don’t know why this
is not default behaviour for InterruptAttachEvent(), but it can
be done programmatically for now. However, none of QNX standard
drivers do that, AFAIK. Therefore, some QNX standard driver might
be a dangerous neighbour to share interrupt line with.

Boosting interrupt hadling thread’s priority by passing argument
to driver may not be a good solution, because it looks more
logically to unmask interrupt ASAP and do the rest of job processing
the data with lower priority (although if rumours are true and
network drivers unmask interrupt after all data have been processed
this really doesn’t make difference, it’s just bad design).

Also, I don’t know if there is a queue for events returned by ISR. (I
did some experiments with QNX 6.0 and I believe there was not queue, but
I’m not sure). It also may make a difference.

Cheers,
Eduard.



The drivers that seem to be the biggest abuser of the flawed method are the network drivers.


Evan

In article <d891fv$k9l$1@inn.qnx.com>, a-steinhoff@web.de says…

Sorry,
I have to admit there is no ‘conflict’ speaking of terminology. I just
thought (as it sometimes happens in international discussion) author
refered to unappropriately big latency for interrupts if they happens to
be shared.
Eduard.

This is not a ‘conflict’ … interrupt sharing is absolutely normal with
PCI.

You have simply to check in you ISR if the interface of your device is
realy the source of the interrupt. If not return with NULL from the ISR.
This is what all other drivers do … that’s all.

–Armin

ed1k wrote:

In article <d8aaaj$jlm$> 1@inn.qnx.com> >, > evanh@clear.net.nz > says…

Armin Steinhoff wrote:
Not at ‘driver level’. It is done at the ISR level and that is more or
less independ from the driver.


What is ‘driver level’ and what is ISR level? I always thought the ISR
is a part of driver (of course, if device serviced by the driver
requires some interrupt handling). How can it be independent?

Yeah, very sorry about shooting off so much. I didn’t read beyond the first four words. That’s what you get for reading emails before going to work …

Agreed, The code placed in the ISR by InterruptAttach() is still part of the driver due to it being part of managing the hardware device that the driver is written for.


Boosting interrupt hadling thread’s priority by passing argument
to driver may not be a good solution, because it looks more
logically to unmask interrupt ASAP and do the rest of job processing
the data with lower priority (although if rumours are true and
network drivers unmask interrupt after all data have been processed
this really doesn’t make difference, it’s just bad design).

Right. The difference between the thread unmasking asap and after all it’s data is processed is purely in the how often the bug is likely to interfere. It doesn’t fix the basic design flaw.

InterruptAttachEvent() fails simply because it has no way to managed the individual sources of the interrupt. So InterruptAttach() is the only way.


Also, I don’t know if there is a queue for events returned by ISR. (I
did some experiments with QNX 6.0 and I believe there was not queue, but
I’m not sure). It also may make a difference.

Shouldn’t be important. The handler only has to know it’s device needs service. After that it’s all down to how soon it gets serviced.


Evan

ed1k wrote:

QNX domain.
[cut]
However, none of QNX standard drivers do that,
AFAIK. Therefore, some QNX standard driver might be a dangerous
neighbour to share interrupt line with.

I agree with all you have written.

Masking the IRQ gives the advantage to avoid “unuseful” interruption (=
events = pulses = process-level context switches) when a first IRQ has
already kicked the thread-level driver running, and it is able to
recognize subsequent events generated by its device just before
returning to the MsgReceive() or InterruptWait(). This makes the
thread-level driver handle many high-frequency events in a single run,
minimizing events queueing and related context switches.

The big problems related to excessive latency in IRQ events delivering
for drivers of devices sharing interrupts are not acceptable in my point
of view: the rigth solution to achieve similar advantages in a shared
scenario is using “enabling/disabling” of interrupts generation at
device level (if this can be done in a reasonable amount of time), so
every driver ISR will decide to stop ONLY its device from generating
unuseful IRQs until it has consumed all the queued events.

Obviously, this cannot be done by the kernel in a “default way”, like
InterruptAttachEvent() is intended to be: hardware-related issues must
be taken into account by a dedicated ISR routine, for the “disabling”
side, and thread code for the “enabling” one.


(although if rumours are true and network drivers unmask interrupt
after all data have been processed this really doesn’t make
difference, it’s just bad design).

I have no clear info about that, but sometimes I get a “sensation” this
is true ;-|

Masking interrupts at device level leave the developer free to unmask
them as the last thing to do just before going to “sleep” in the
MsgReceive/InterruptWait, with no problems for other threads.


Also, I don’t know if there is a queue for events returned by ISR. (I
did some experiments with QNX 6.0 and I believe there was not queue,
but I’m not sure). It also may make a difference.

If InterruptAttachEvent() is used, there’s no need for a queue: at least
1 event will be pending toward the thread-level driver.

If InterruptAttach() is used, I suppose the standard pulse/event
queueing mechanism is used.

Davide


/* Ancri Davide - */

Davide Ancri wrote:

Another good summary. :slight_smile:


Thanks,
Evan

ed1k wrote:

In article <d8aaaj$jlm$> 1@inn.qnx.com> >, > evanh@clear.net.nz > says…

Armin Steinhoff wrote:

Evan Hillas wrote:

Armin Steinhoff wrote:

This is not a ‘conflict’ … interrupt sharing is absolutely normal
with PCI.


It is, however, a conflict at the driver level purely because of
internal QSSL decision.


Not at ‘driver level’. It is done at the ISR level and that is more or
less independ from the driver.



What is ‘driver level’

this stands for the ‘driver model’ or resource manager design

and what is ISR level?

An ISR is an entity of its own … its design is more or less
independend from ‘driver model’. In other words its design is driven by
the hardware/interrupt interfaces.

–Armin

Armin Steinhoff wrote:

ed1k wrote:
What is ‘driver level’


this stands for the ‘driver model’ or resource manager design

QNX in particular has many drivers that fall outside of this description.


Evan

“ed1k” <ed1k@fake.address> wrote in message
news:MPG.1d12bf2fe9d179119896c9@inn.qnx.com

Windows domain.

Linux domain.

QNX domain.

Now take a look at
InterruptAttachEvent(). It puts “default” ISR handler which masks
interrupt line and returns event to waiting thread. Fine. The
problem lays in a priority of that waiting thread. It may be a way
too low and this thread would be preempted by more “high priority”
task effectively leaving hardware interrupt line deaf to the
requests. And this is a difference from two models described above.

Don’t Windows and Linux allow threads to mask an interrupt in a way similar
to InterruptMask() in QNX? If they do, then they have the same problem.
The problem is not that the ISR masks an interrupt; the problem is that
you’re relying on a thread to unmask it. A thread that calls
InterruptMask() and InterruptUnmask() can get pre-empted between those
calls, too. The main difference is that InterruptAttachEvent() may let it
happen more often (sometimes a lot more often), because it makes it possible
for your interrupt to get masked while a high-priority thread is already
running. Whether that’s a big difference depends on whether you can live
with it happening occasionally; but if it can cause people to die when it
happens, it’s probably better if it happens every five seconds rather than
once in five months, because this way your product won’t get shipped until
it’s fixed.