problem about share interrupt

cdm · February 5, 2004, 5:13am

It’s the rule of any system. If you have two drivers listening on the same interrupt and one of those drivers you do not control, you cannot assume anything in terms of deterministic behavior. It is a general comment, not a specific one. You have to mask that interrupt until the driver(s) in question get to service their devices or you could spin forever. If you want deterministic behavior, get your devices separate interrupts. This is easiest to do when you don’t use an ancient system like a PC, but is doable with PC designs that are aimed at high performance rather then low cost.

evanh · February 5, 2004, 6:58am

Not so easy to do on general computing platform. With PCI, one would need a couple of dedicated irq’s available for every card slot and some more for the built-in devices.

I still say it’s the equivalent of disabling interrupts for a long period of time. The rule that says don’t do it is not enforceable so you’ve lost deterministicness right there.

And mario is right too, InterruptAttachEvent() is producing a potential priority inversion.

rick · February 5, 2004, 1:57pm

Like any system tho, you can design it poorly. If you choose to use InterruptAttachEvent(), you have made a design decision to handle some level of interrupt handling at that threads priority level. If you then add another thread which is running at a higher priority, especially if that thread uses any amount of CPU, you made the choice to lose both determinism and potentially seriously increase interrupt latency.

The biggest issue here is it should be well documented in the docs and it really isn’t. For that matter, the QNX “Intro to Neutrino” course (which does cover different ways of handling interrupts) doesn’t emphasize the consequences of taking the easy way of handling interrupts. They emphasize how easy it is, without really pointing out these kinds of issues.

mario · February 5, 2004, 9:43pm

It is deterministic in my opinion cause the behavior is well know. What you loose is real time if share interrupt are used with InterruptAttachEvent. Given most QSS driver are using InterruptAttachEvent I sounds to me like the OS as a whole(on x86/PC) has lost some of it’s realtimeness.

I have control over my software but I have none over QSS and since I MUST assume all driver are using InterruptAttachEvent, I must also assume that unless all device are not sharing interrupt that I have lost a great deal of real-time ness.

Having shared interrupt only means (to me) that one handler gets called after the other not that it gets preempted by a thread…

Having to explain to someone that QNX6 isn’t real-time when interrupts are shared sucks.

Ah well. why should I care ?!?!

evanh · February 5, 2004, 9:47pm

Poor design choice is the way, it seems. The way it stands, the moment InterruptAttachEvent() is utilised sharing interrupts is a right-off.

rick · February 5, 2004, 11:08pm

There are lots of things which can screw up your design and it is not very clear to me, nevermind someone less experienced what all those caveats are. For example hardware with SMI’s, Bus mastering devices, shared interrupt (especially high rate/low rate sharing) can all screw you up if you are not aware of the issues. Then there are all sorts of software caveats to worry about. For example, Using a thread to service an interrupt and not making it high enough priority.

Personally I like InterruptAttachEvent() as it makes for easy code to debug (well easier than an int handler). I try to make sure if I have to share interrupts, I don’t share a time sensitive one with a high volume one.

Maybe we need to create a FAQ for hardware and software caveats where determinism is needed?

evanh · February 6, 2004, 12:41am

One thing that would work rather well with the current software is all the buffered streaming devices sitting on the same irq, this would work because those devices don’t care much about responce time. IRQ#15 would be a good spot.

Alas, on your average PC, you aren’t given that option so we are back to sharing different types of devices again.

And back to the general rule of don’t disable interrupts for large periods of time.

mario · February 6, 2004, 2:40am

What trouble me the most is that QSS own driver are most probably using InterruptEventAttached which imply their drivers screws up real-time whenever interrupts are shared.

I’m looking at my PC right now (typical motherboard) through windows. I don’t have a single PCI card, yet there are 19 devices connected to a IRQ.
Obviously some are shared, which means bye-bye real-time ;-(((((

If I were to choose an OS for it’s real-time capacility QNX6 just lots a bunch of point in my book. I always though people using QNX4/QNX6 did so because one could assume OS drivers didn’t disable interrupt for long period of time or had very short ISR all in the name of real-time friendliness.

I believe Linux and Windows get decribed as non real-time for exactly these type of behavior.

Maybe I am overeacting but I feel something is very very wrong here, given where QSS comes from (now where they want to go is a different matter)

evanh · February 6, 2004, 2:00pm

Well, Rick, one of the first things to go into the faq, along side this chosen behaviour, is a list of drivers that use it. From what I’ve been told, there is quite a few of 'em. They might be quite hard to partition on a loaded system.

Anyone got some hard data?

mario · February 6, 2004, 4:37pm

What ever data made available, it’s not going to be very reliable. For every release and patches (fortunately/unfortunaltey) the whole thing will need to be revisited.

I think one should be able programatically to detect if a program uses InterruptEventAttach or not, this type of info should be spit out by pidin ir (maybe it already is i’ll check)

cdm · February 6, 2004, 7:43pm

Here is the deal, using InterruptAttach() in this case wouldn’t help AT ALL. The only time it would help with the behavior is if there is a device driver with almost no interrupt activity, which will not be the case of an ethernet device on a network.

Say the ethernet driver did use InterruptAttach(), when it was one of it’s interrupts it would still have to Mask it and send the sigevent to the thread. The thread is then going to keep the interrupt masked until all the buffers in the ring have been pulled out and new ones setup. So, you might ask why not unmask the interrupt right away. Fine, we do that and while we are taking out data from the ring buffers another interrupt comes in which casuses the ISR to run, masking the interrupt and delivering the signal. No further ahead.

The only time where an ISR wins (in this case of latency on a shared interrupt) is when there is a packet/interrupt situation and the ISR can intelligently not deliver sigevent’s. Which is why things like character devices do use ISRs.

There is no way to provide complete determinism with a shared interrupt source. And mario, what Linux and Windows get trashed on is the amount of time they mask all interrupts, not a single one. Although they do mask single ones as well, that isn’t the real issue for realtime response. This is a hardware issue, not a software one.

evanh · February 6, 2004, 8:23pm

The only problem with InterruptAttachEvent() is the potential priority inversion.

The drivers holding onto the masks is my main complaint. I’m sorry if I have confused the two.

But we are further ahead because that interrupt is extraneous so is simply discarded and unmasked again. There is also the obvious extra overhead generated this way when there is a large number of interrupts. But a decent controller will have on-chip masking to remove this overhead.

The best solution for each driver will fall into a few general answers:

Document the way it is now.
Change the masking to short-term and consume any extra interrupts.
Utillise on-device masking where available.

rick · February 7, 2004, 1:25pm

I am having trouble wrapping my head around this issue and want to try to explain it a different way to see if that helps.

Normally priority inversion exists when a situation allows a low priority thread is allowed to prevent a higher priority thread to run. Since this is counter intuitive to the designers wishes, software which prevents/alieviates this is a “good” thing.

Interrupts are always said to be at a higher level than the normal priority system, therefor implying that interrupt servicing is not impacted by the thread priorities.

A different way to describe this is we have two priority domains, one for threads, one for interrupts.

Interrupt AttachEvent() allows us to cross these domains and allows a thread in the “lower” priority domain (the thread one), to impact higher level domain (the hardware interrupt one). Although this has the impression of priority inversion, I am not sure it fit the definition since the problem crosses domain boundries.

What could the OS do about this? In order to close a potential race condition, it has to mask the interrrupt before scheduling the thread which called InterruptAttachEvent() which opens us up to the inversion by allowing a thread in the lower domain to impact the higher domain.

To use the normal solution, we would have to promote that thread to the priority of the highest impacted thread - and since we are potentially impacting other ISRs (since the interrupt is masked), we would have to run at the highest possible priority.

This would cause (I think) other problems - in particular it would basically make the thread an ISR in which case why not use an ISR?

So if the OS knows all the threads blocked on the same interrupt, and it could promote the thread to the priority of the highest waiting thread further down the handler chain.

This would not alieveate the ability for a high priority CPU intensive thread from preventing the servicing (and unmasking) of the interrupt.

So if you have managed to stick with my wanderings so far, I end up with the same conclusion. The problem is not one of the OS in particular. If you want the conveinience of handling interrupts outside of ISR’s, there are caveats. The only solution is documentation.

Of course if you have a high priority task running READY by design, you probably don’t need a RTOS - just an executive.

evanh · February 8, 2004, 12:36am

Ok, the priority inversion is not related to the masking. It is purely to do with the thread level functioning. And as per my previous conversation, it is as much to do with not having control/knowledge over the other drivers or high priority miss-fits that other people are using on a general computing platform. The only thing I’m designing here is one driver.

The inversion simply occurs if my driver’s attached thread is configured at a lower priorty than anything else, except for other attached threads of course. My one can only executing if the higher threads are not busy.

That is the inversion. It impacts on all InterruptAttachEvent() threads, not just the shared ones.

A side effect of this, just happens to be, the mask being held even longer.

Fixing this most definitely does not make it the same as an ISR. You have already said drivers using InterruptAttachEvent() are easier to debug than InterruptAttach(), that won’t change. What does change is the driver gets the event asap.

cdm · February 8, 2004, 12:46am

You may get the ISR to run ASAP but there is a large limit to what you can do in an ISR - no system calls, not nothing. Also, ISRs don’t obey any priorities, so if you have a long running ISR you can mess up high priority tasks.

Thankfully with 6.3.0 there are 256 priority levels so there will be wide rooms for making sure your priorities are straight. I also belive that in 6.2.1 most drivers allow you to set the priority level of the interrupt thread, so you can control it on your system.

evanh · February 8, 2004, 2:16am

I wasn’t talking about ISRs but I got your point anyway. That the drivers need to have a definable priority for some processing intensive handlers and that promotion because of an intr event would ruin that. So calling this an inversion is not appropriate.

Now we are back to documentation again. Prefered priorities and what not. I see that devc-ser8250 is level 24o and a lot of level 21r from the qnx supplied drivers. I don’t know of any guidelines relating to where to position a new driver, not to mention just being an ass with a user program.

evanh · February 8, 2004, 2:24am

Hmm, this reflects right back to your original statement of “Once you share an interrupt you can never be assured deterministic behavior”, I will add, under qnx.

cdm · February 8, 2004, 8:56pm

It has nothing to do with QNX actually.

rgallen · February 8, 2004, 11:43pm

Yikes.

I’ve just read through this thread.

Strange to see people getting flustered about something that became an issue for PC’s back in 1993 (over 10 years ago). I seem to remember a discussion just like this on quics way back then (it’s been a long time though, might have been comp.realtime).

I have been twiddling with bios settings, and physically moving PCI cards around in PC’s ever since 93, in order to cajole the PCI bios into giving me the IRQ assignments I wanted (last week I spent 1/2 a day trying to get a multiport serial card to stop sharing an interrupt with a NIC).

Personally, I like PC’s, even though I have to do this crap; it is a price I am willing to pay for a 1GHz machine with 256MB of ram that costs 800.00 USD (revision controlled industrial motherboard).

bjchip · February 9, 2004, 12:37am

I second that. If you have a problem with interrupt sharing on the PCI bus you swap slots, twiddle the bios settings and ensure that the critical interrupts are NOT shared. If there are too many “critical” interrupts you have to deal with it as custom hardware, because in the end it is a “hardware” issue, not a problem with the RTOS per-se. I have been through this with other RTOS products. It is impossible to guarantee sane interrupt behaviour when there are multiple high-rate devices sharing a line, and you can usually control the lines that are in use. Are you really using all that stuff that is assigned an IRQ on the PCI bus? Heck, I’ve seen this problem on real-time systems that didn’t have ANY OS.

It is annoying to have to juggle the hardware to make the system behave. It is fruitless to try to make it behave without doing the juggling. I’d rather be annoyed than frustrated.

respectfully
BJ