Net.ether82557 doesn't get any more interrupts ...

Hello QSSL,

I need some help on a really wierd problem:

We use a lot of PC104-Board (made by Lippert - german brand),
which work very well over years with QNX 4.25E (details see below).

Since a few week, in some installation it happens, the Net.ether82557
doesn’t get any more HW interrupts – we checked this with ‘monitor’.
This happen after the whole box has run several hours or even days
without any problems (nothing in tracelog or syslog).

If we ‘ping’ to a remote address we can even see data coming from
that NIC on the board and it also ist sent the reply, but it doesn’t
receive it.

Common issue on all failing machines: there is always one of our own
tasks running, that attaches a IRQ (Dev.profi in this case, but also
other task with IRQ handlers).

But these Task still work correctly, only Net.ether82557 doesn’t get
any more interrupt.

Does anyone have an idea, how we can proceed to fix this?

Can I check, how the interrupt controller is actually set up?

Can mess up the interrupt controller some (accidently, of course?

Any hints are appreciated!!

TIA,

Karsten.





PROGRAM NAME VERSION DATE
sys/Proc32 Proc 4.25N Sep 24 2001
sys/Proc32 Slib16 4.23G Oct 04 1996
sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24V Feb 18 2000
/bin/Fsys.eide eide 4.25A Feb 09 2000
//1/bin/Dev32 Dev32 4.23G Oct 04 1996
//1/usr/dp/bin/Dev.led Dev.led 1.08G Jul 14 2003
//1/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//1/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//1/bin/Pipe Pipe 4.23A Feb 26 1996
//1/usr/ucb/Socklet Socklet 4.25H Jul 30 1999
//1//dp/bin/DPmngr.ugw DPmngr.ugw 1.08G Jul 14 2003
//1/bin/Mqueue mqueue 4.24A Aug 30 1999
//1/
/dp/bin/Dev.profi Dev.profi 1.08G Jul 14 2003
//1/bin/Net Net 4.25C Aug 30 1999
//1/bin/Net.ether82557 Net.ether825 4.25G Jan 11 2001
//1/bin/Dev32.ser Dev32.ser 4.25X Jun 07 1999


SID PID PROGRAM PRI STATE BLK CODE DATA
– – Microkernel — ----- — 10524 0
0 1 sys/Proc32 30f READY — 118k 339k
0 2 sys/Slib32 10r RECV 0 53k 4096
0 4 /bin/Fsys 16r RECV 0 77k 372k
0 5 /bin/Fsys.eide 22r RECV 0 61k 114k
0 8 idle 0r READY — 0 65k
0 20 //1/bin/Dev32 24f RECV 0 32k 81k
0 24 //1/usr/dp/bin/Dev.led 22r RECV 0 20k 20k
0 28 //1/bin/Dev32.ansi 20r RECV 0 40k 139k
0 32 //1/bin/Dev32.pty 20r RECV 0 12k 49k
0 37 //1/bin/Pipe 16r RECV 0 16k 24k
0 43 //1/bin/nameloc 20o RECV 0 6144 20k
0 44 //1/bin/nameloc 20o REPLY 0 6144 24k
0 101 //1/usr/ucb/Socklet 22r RECV 0 114k 200k
0 124 //1/usr/ucb/inetd 16o RECV 125 18k 40k
0 165 //1/usr/bin/syslogd 10o RECV 0 36k 32k
0 192 //1/bin/Mqueue 16o RECV 0 20k 9854k
0 314 //1/bin/Dev32.ser 20r RECV 0 12k 24k
0 338 //1/bin/tinit 10o WAIT -1 8192 28k
0 339 //1/bin/tinit 16o WAIT -1 8192 28k
1 357 //1/bin/ksh 10o REPLY 20 23k 45k
1 392 //1/usr/dp/bin/DPquery 9o RECV 0 81k 28k
4 1079 //1/bin/ksh 16o REPLY 20 23k 49k
3 1392 //1/usr/bin/ditto 16o REPLY 20 57k 114k
2 14500 //1//dp/bin/DPmngr.ugw 12o RECV 0 184k 73k
2 14530 //1/
/dp/bin/Dev.profi 10o RECV 0 32k 32k
2 15529 //1//bin/Drv.bacsrvr 10o READY — 294k 409k
2 15530 //1/
/dp/bin/Drv.pbfms 10o RECV 0 151k 139k
5 15908 //1/bin/ksh 16o WAIT -1 23k 45k
5 16668 //1/bin/ksh 16o WAIT -1 23k 45k
5 16758 //1/usr/ucb/inetd 16o RECV 18057 18k 36k
5 17685 //1/bin/Net 23r RECV 0 32k 65k
5 17699 //1/bin/Net.ether82557 20r RECV 0 45k 176k
5 22604 //1/bin/sin 16o REPLY 1 45k 40k

IRQ PID PROGRAM CS:IP DS
-1 20 //1/bin/Dev32 0005:005760 000D
-1 28 //1/bin/Dev32.ansi 0005:005DC0 000D
-1 17685 //1/bin/Net 0015:00468C 001D
-1 314 //1/bin/Dev32.ser 0005:001865 000D
0 1 sys/Proc32 00F0:004D03 00F8
0 17685 //1/bin/Net 0015:0047AB 001D
1 28 //1/bin/Dev32.ansi 0005:00690C 000D
4 314 //1/bin/Dev32.ser 0005:001112 000D
5 14530 //1/*/dp/bin/Dev.profi 0005:1CBC2A 000D
13 1 sys/Proc32 00F0:004CC7 00F8
14 5 /bin/Fsys.eide 0005:004860 000D
15 17685 //1/bin/Net 0015:0055EC 001D


| / | __ ) | Karsten.Hoffmann@mbs-software.de MBS-GmbH
| |/| | _ _
\ Phone : +49-2151-7294-38 Karsten Hoffmann
| | | | |
) |__) | Fax : +49-2151-7294-50 Roemerstrasse 15
|| ||// Mobile: +49-172-3812373 D-47809 Krefeld

Is any QNX4-Wizard here?

Karsten.Hoffmann@mbs-software.de wrote:

Hello QSSL,

I need some help on a really wierd problem:

We use a lot of PC104-Board (made by Lippert - german brand),
which work very well over years with QNX 4.25E (details see below).

Since a few week, in some installation it happens, the Net.ether82557
doesn’t get any more HW interrupts – we checked this with ‘monitor’.
This happen after the whole box has run several hours or even days
without any problems (nothing in tracelog or syslog).

If we ‘ping’ to a remote address we can even see data coming from
that NIC on the board and it also ist sent the reply, but it doesn’t
receive it.

Common issue on all failing machines: there is always one of our own
tasks running, that attaches a IRQ (Dev.profi in this case, but also
other task with IRQ handlers).

But these Task still work correctly, only Net.ether82557 doesn’t get
any more interrupt.

Does anyone have an idea, how we can proceed to fix this?

Can I check, how the interrupt controller is actually set up?

Can mess up the interrupt controller some (accidently, of course?

Any hints are appreciated!!

TIA,

Karsten.





PROGRAM NAME VERSION DATE
sys/Proc32 Proc 4.25N Sep 24 2001
sys/Proc32 Slib16 4.23G Oct 04 1996
sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24V Feb 18 2000
/bin/Fsys.eide eide 4.25A Feb 09 2000
//1/bin/Dev32 Dev32 4.23G Oct 04 1996
//1/usr/dp/bin/Dev.led Dev.led 1.08G Jul 14 2003
//1/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//1/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//1/bin/Pipe Pipe 4.23A Feb 26 1996
//1/usr/ucb/Socklet Socklet 4.25H Jul 30 1999
//1//dp/bin/DPmngr.ugw DPmngr.ugw 1.08G Jul 14 2003
//1/bin/Mqueue mqueue 4.24A Aug 30 1999
//1/
/dp/bin/Dev.profi Dev.profi 1.08G Jul 14 2003
//1/bin/Net Net 4.25C Aug 30 1999
//1/bin/Net.ether82557 Net.ether825 4.25G Jan 11 2001
//1/bin/Dev32.ser Dev32.ser 4.25X Jun 07 1999


SID PID PROGRAM PRI STATE BLK CODE DATA
– – Microkernel — ----- — 10524 0
0 1 sys/Proc32 30f READY — 118k 339k
0 2 sys/Slib32 10r RECV 0 53k 4096
0 4 /bin/Fsys 16r RECV 0 77k 372k
0 5 /bin/Fsys.eide 22r RECV 0 61k 114k
0 8 idle 0r READY — 0 65k
0 20 //1/bin/Dev32 24f RECV 0 32k 81k
0 24 //1/usr/dp/bin/Dev.led 22r RECV 0 20k 20k
0 28 //1/bin/Dev32.ansi 20r RECV 0 40k 139k
0 32 //1/bin/Dev32.pty 20r RECV 0 12k 49k
0 37 //1/bin/Pipe 16r RECV 0 16k 24k
0 43 //1/bin/nameloc 20o RECV 0 6144 20k
0 44 //1/bin/nameloc 20o REPLY 0 6144 24k
0 101 //1/usr/ucb/Socklet 22r RECV 0 114k 200k
0 124 //1/usr/ucb/inetd 16o RECV 125 18k 40k
0 165 //1/usr/bin/syslogd 10o RECV 0 36k 32k
0 192 //1/bin/Mqueue 16o RECV 0 20k 9854k
0 314 //1/bin/Dev32.ser 20r RECV 0 12k 24k
0 338 //1/bin/tinit 10o WAIT -1 8192 28k
0 339 //1/bin/tinit 16o WAIT -1 8192 28k
1 357 //1/bin/ksh 10o REPLY 20 23k 45k
1 392 //1/usr/dp/bin/DPquery 9o RECV 0 81k 28k
4 1079 //1/bin/ksh 16o REPLY 20 23k 49k
3 1392 //1/usr/bin/ditto 16o REPLY 20 57k 114k
2 14500 //1//dp/bin/DPmngr.ugw 12o RECV 0 184k 73k
2 14530 //1/
/dp/bin/Dev.profi 10o RECV 0 32k 32k
2 15529 //1//bin/Drv.bacsrvr 10o READY — 294k 409k
2 15530 //1/
/dp/bin/Drv.pbfms 10o RECV 0 151k 139k
5 15908 //1/bin/ksh 16o WAIT -1 23k 45k
5 16668 //1/bin/ksh 16o WAIT -1 23k 45k
5 16758 //1/usr/ucb/inetd 16o RECV 18057 18k 36k
5 17685 //1/bin/Net 23r RECV 0 32k 65k
5 17699 //1/bin/Net.ether82557 20r RECV 0 45k 176k
5 22604 //1/bin/sin 16o REPLY 1 45k 40k

IRQ PID PROGRAM CS:IP DS
-1 20 //1/bin/Dev32 0005:005760 000D
-1 28 //1/bin/Dev32.ansi 0005:005DC0 000D
-1 17685 //1/bin/Net 0015:00468C 001D
-1 314 //1/bin/Dev32.ser 0005:001865 000D
0 1 sys/Proc32 00F0:004D03 00F8
0 17685 //1/bin/Net 0015:0047AB 001D
1 28 //1/bin/Dev32.ansi 0005:00690C 000D
4 314 //1/bin/Dev32.ser 0005:001112 000D
5 14530 //1/*/dp/bin/Dev.profi 0005:1CBC2A 000D
13 1 sys/Proc32 00F0:004CC7 00F8
14 5 /bin/Fsys.eide 0005:004860 000D
15 17685 //1/bin/Net 0015:0055EC 001D


| / | __ ) | Karsten.Hoffmann@mbs-software.de MBS-GmbH
| |/| | _ _
\ Phone : +49-2151-7294-38 Karsten Hoffmann
| | | | |
) |__) | Fax : +49-2151-7294-50 Roemerstrasse 15
|| ||// Mobile: +49-172-3812373 D-47809 Krefeld

Hi,

I work on a system that uses the QNX 4.24 (plus patches) and the
Net.ether82557 driver and it has performed well in our systems, running
for hours and days.

Our system includes a process that handles the interrupts detected by
QNX. This process inspects received interrupts and associated devices to
see if the interrupt and device require processing. If the process finds
the interrupt is not for a device of interest it must return CPU control
back to QNX where QNX can then see if the interrupt is associated with a
device under its control, i.e. devices like the network interface chip.
When the process returns CPU control to QNX it provides a return value.
That value is a Proxy if the interrupt was handled (i.e. the interrupt
was associated to a device requesting attention) by this process and 0
if the interrupt was not handled by this process and QNX must check it.
I’m not certain of the above order of process. Maybe you have an process
handling interrupts and is stealing an interrupt from the Net driver?

Modify your ISR software to monitor the IRQs assocaited with Net and
when detected simply return with value 0 so QNX handles them. With this
working you could create a trace mechanism to record in a circular list
the events of receiving Net IRQs. You might also be able to use the
Watcom Debugger to catch a Net IRQ going through your process and verify
your process is handling it correctly.

Regards
Charlie

Karsten.Hoffmann@mbs-software.de wrote:

Hello QSSL,

I need some help on a really wierd problem:

We use a lot of PC104-Board (made by Lippert - german brand),
which work very well over years with QNX 4.25E (details see below).

Since a few week, in some installation it happens, the Net.ether82557
doesn’t get any more HW interrupts – we checked this with ‘monitor’.
This happen after the whole box has run several hours or even days
without any problems (nothing in tracelog or syslog).

If we ‘ping’ to a remote address we can even see data coming from
that NIC on the board and it also ist sent the reply, but it doesn’t
receive it.

Common issue on all failing machines: there is always one of our own
tasks running, that attaches a IRQ (Dev.profi in this case, but also
other task with IRQ handlers).

But these Task still work correctly, only Net.ether82557 doesn’t get
any more interrupt.

Does anyone have an idea, how we can proceed to fix this?

Can I check, how the interrupt controller is actually set up?

Can mess up the interrupt controller some (accidently, of course?

Any hints are appreciated!!

TIA,

Karsten.





PROGRAM NAME VERSION DATE
sys/Proc32 Proc 4.25N Sep 24 2001
sys/Proc32 Slib16 4.23G Oct 04 1996
sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24V Feb 18 2000
/bin/Fsys.eide eide 4.25A Feb 09 2000
//1/bin/Dev32 Dev32 4.23G Oct 04 1996
//1/usr/dp/bin/Dev.led Dev.led 1.08G Jul 14 2003
//1/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//1/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//1/bin/Pipe Pipe 4.23A Feb 26 1996
//1/usr/ucb/Socklet Socklet 4.25H Jul 30 1999
//1//dp/bin/DPmngr.ugw DPmngr.ugw 1.08G Jul 14 2003
//1/bin/Mqueue mqueue 4.24A Aug 30 1999
//1/
/dp/bin/Dev.profi Dev.profi 1.08G Jul 14 2003
//1/bin/Net Net 4.25C Aug 30 1999
//1/bin/Net.ether82557 Net.ether825 4.25G Jan 11 2001
//1/bin/Dev32.ser Dev32.ser 4.25X Jun 07 1999


SID PID PROGRAM PRI STATE BLK CODE DATA
– – Microkernel — ----- — 10524 0
0 1 sys/Proc32 30f READY — 118k 339k
0 2 sys/Slib32 10r RECV 0 53k 4096
0 4 /bin/Fsys 16r RECV 0 77k 372k
0 5 /bin/Fsys.eide 22r RECV 0 61k 114k
0 8 idle 0r READY — 0 65k
0 20 //1/bin/Dev32 24f RECV 0 32k 81k
0 24 //1/usr/dp/bin/Dev.led 22r RECV 0 20k 20k
0 28 //1/bin/Dev32.ansi 20r RECV 0 40k 139k
0 32 //1/bin/Dev32.pty 20r RECV 0 12k 49k
0 37 //1/bin/Pipe 16r RECV 0 16k 24k
0 43 //1/bin/nameloc 20o RECV 0 6144 20k
0 44 //1/bin/nameloc 20o REPLY 0 6144 24k
0 101 //1/usr/ucb/Socklet 22r RECV 0 114k 200k
0 124 //1/usr/ucb/inetd 16o RECV 125 18k 40k
0 165 //1/usr/bin/syslogd 10o RECV 0 36k 32k
0 192 //1/bin/Mqueue 16o RECV 0 20k 9854k
0 314 //1/bin/Dev32.ser 20r RECV 0 12k 24k
0 338 //1/bin/tinit 10o WAIT -1 8192 28k
0 339 //1/bin/tinit 16o WAIT -1 8192 28k
1 357 //1/bin/ksh 10o REPLY 20 23k 45k
1 392 //1/usr/dp/bin/DPquery 9o RECV 0 81k 28k
4 1079 //1/bin/ksh 16o REPLY 20 23k 49k
3 1392 //1/usr/bin/ditto 16o REPLY 20 57k 114k
2 14500 //1//dp/bin/DPmngr.ugw 12o RECV 0 184k 73k
2 14530 //1/
/dp/bin/Dev.profi 10o RECV 0 32k 32k
2 15529 //1//bin/Drv.bacsrvr 10o READY — 294k 409k
2 15530 //1/
/dp/bin/Drv.pbfms 10o RECV 0 151k 139k
5 15908 //1/bin/ksh 16o WAIT -1 23k 45k
5 16668 //1/bin/ksh 16o WAIT -1 23k 45k
5 16758 //1/usr/ucb/inetd 16o RECV 18057 18k 36k
5 17685 //1/bin/Net 23r RECV 0 32k 65k
5 17699 //1/bin/Net.ether82557 20r RECV 0 45k 176k
5 22604 //1/bin/sin 16o REPLY 1 45k 40k

IRQ PID PROGRAM CS:IP DS
-1 20 //1/bin/Dev32 0005:005760 000D
-1 28 //1/bin/Dev32.ansi 0005:005DC0 000D
-1 17685 //1/bin/Net 0015:00468C 001D
-1 314 //1/bin/Dev32.ser 0005:001865 000D
0 1 sys/Proc32 00F0:004D03 00F8
0 17685 //1/bin/Net 0015:0047AB 001D
1 28 //1/bin/Dev32.ansi 0005:00690C 000D
4 314 //1/bin/Dev32.ser 0005:001112 000D
5 14530 //1/*/dp/bin/Dev.profi 0005:1CBC2A 000D
13 1 sys/Proc32 00F0:004CC7 00F8
14 5 /bin/Fsys.eide 0005:004860 000D
15 17685 //1/bin/Net 0015:0055EC 001D

The return from in ISR can be a proxyID or 0. But in either case
QNX will continue to process the interrupt.

Also, there is no way to know (or assume) that you were the first
to receive the interrupt. I.E. Multiple process may be trapping
INTx. They probibly won’t know about the existance of the other.
So they won’t know if they were first or second.


Charlie Powell <NO4powellzSPAM@cox.net> wrote:
CP > Hi,

CP > I work on a system that uses the QNX 4.24 (plus patches) and the
CP > Net.ether82557 driver and it has performed well in our systems, running
CP > for hours and days.

CP > Our system includes a process that handles the interrupts detected by
CP > QNX. This process inspects received interrupts and associated devices to
CP > see if the interrupt and device require processing. If the process finds
CP > the interrupt is not for a device of interest it must return CPU control
CP > back to QNX where QNX can then see if the interrupt is associated with a
CP > device under its control, i.e. devices like the network interface chip.
CP > When the process returns CPU control to QNX it provides a return value.
CP > That value is a Proxy if the interrupt was handled (i.e. the interrupt
CP > was associated to a device requesting attention) by this process and 0
CP > if the interrupt was not handled by this process and QNX must check it.
CP > I’m not certain of the above order of process. Maybe you have an process
CP > handling interrupts and is stealing an interrupt from the Net driver?

CP > Modify your ISR software to monitor the IRQs assocaited with Net and
CP > when detected simply return with value 0 so QNX handles them. With this
CP > working you could create a trace mechanism to record in a circular list
CP > the events of receiving Net IRQs. You might also be able to use the
CP > Watcom Debugger to catch a Net IRQ going through your process and verify
CP > your process is handling it correctly.

CP > Regards
CP > Charlie

Charlie Powell <NO4powellzSPAM@cox.net> wrote:

Modify your ISR software to monitor the IRQs assocaited with Net and
when detected simply return with value 0 so QNX handles them. With this
working you could create a trace mechanism to record in a circular list
the events of receiving Net IRQs. You might also be able to use the
Watcom Debugger to catch a Net IRQ going through your process and verify
your process is handling it correctly.

I guess, that’s what ‘monitor’ does.

Thanks for your replies, anyway.

In fact our problem turned out to be a ‘poor’ power supply, which
didn’t suppress burst on the lines good enough …


| / | __ ) | Karsten.Hoffmann@mbs-software.de MBS-GmbH
| |/| | _ _
\ Phone : +49-2151-7294-38 Karsten Hoffmann
| | | | |
) |__) | Fax : +49-2151-7294-50 Roemerstrasse 15
|| ||// Mobile: +49-172-3812373 D-47809 Krefeld