System Crash

Hi,

We are working with a Intel based industrial PC system and QNX 4.25.
At the system are running several processes, also processes to aquire
and compress
very large images and different audio sources. That generates a big
amount of
PCI DMA requests and interrupts.
At the moment we are working with a Celeron 433 Mhz with the older
processor
kernel technology (kathmai) and 66 Mhz system bus clock rate. The system
works continously and very reliable about weeks and months without
control.
With the actual processor generation with coppermine
kernel (PIII and Celeron, same and higher processor clock rate and
100 Mhz system bus clock rate) we have a problem:
Some minuts after system the processes die one after the other with
“Terminated (SIGFPE) 0x…”. This problem includes QNX system
processes and
our own processes.
Sometimes we had immeditly system reset.
But we have never seen the proc-managers kernel dump.
The same problems we also had with other prpcesses, e.g. ALSA audio
interface.

Has anyone experience with this problem or an idea to solve it.

With best regards


Joerg Hering
Chief Developer VDR
e-mail: jhering@avecs-bergen.de
privat: hering.ruegen@t-online.de
mobile: jhering.ruegen@gmx.de

AVECS Bergen GmbH
Billrothstraße 11a
D-18528 Bergen auf Rügen

Tel.: +49 3838 2119101
Fax: +49 3838 2119105
URL: http://www.avecs-bergen.de


“Jörg Hering” <jhering@avecs-bergen.de> wrote in message
news:3C458802.9030300@avecs-bergen.de

Hi,

We are working with a Intel based industrial PC system and QNX 4.25.
At the system are running several processes, also processes to aquire
and compress
very large images and different audio sources. That generates a big
amount of
PCI DMA requests and interrupts.
At the moment we are working with a Celeron 433 Mhz with the older
processor
kernel technology (kathmai) and 66 Mhz system bus clock rate. The system
works continously and very reliable about weeks and months without
control.
With the actual processor generation with coppermine
kernel (PIII and Celeron, same and higher processor clock rate and
100 Mhz system bus clock rate) we have a problem:
Some minuts after system the processes die one after the other with
“Terminated (SIGFPE) 0x…”. This problem includes QNX system
processes and our own processes.

That’s VERY odd, to my knowledge none of QNX processes
used floating point instruction. However a division by 0 (integer) will
also generate that signal.

Why would all processes die in such a way is beyond me. The only thing
that comes to mind is a corruption in Slib (which is shared amongst all
processes)

Do you use some special/custom hardware. Have you tried different
motherboard (with different chipset)

Can you pin point which software triggers this?

Sometimes we had immeditly system reset.
But we have never seen the proc-managers kernel dump.
The same problems we also had with other prpcesses, e.g. ALSA audio
interface.

Has anyone experience with this problem or an idea to solve it.

With best regards


Joerg Hering
Chief Developer VDR
e-mail: > jhering@avecs-bergen.de
privat: > hering.ruegen@t-online.de
mobile: > jhering.ruegen@gmx.de

AVECS Bergen GmbH
Billrothstraße 11a
D-18528 Bergen auf Rügen

Tel.: +49 3838 2119101
Fax: +49 3838 2119105
URL: > http://www.avecs-bergen.de






SID PID PROGRAM PRI STATE BLK CODE DATA
– – Microkernel — ----- — 10524 0
0 1 sys/Proc32 30f READY — 118k 516k
0 2 sys/Slib32 10r RECV 0 53k 4096
0 5 …/bin/Fsys 10r RECV 0 77k 9539k
0 6 …/bin/Fsys.diskonchip 10r RECV 0 94k 98k
0 8 //1/bin/ksh 10o WAIT -1 23k 36k
0 9 idle 0r READY — 0 40k
0 17 //1/bin/Dev32 24f RECV 0 32k 90k
0 20 //1/bin/Dev32.ser 20r RECV 0 16k 73k
0 21 //1/bin/Dev32.ansi 20r RECV 0 40k 90k
0 23 //1/bin/Dev32.pty 20r RECV 0 12k 32k
0 25 //1/bin/Pipe 10r RECV 0 16k 40k
0 27 //1/bin/Net 23r RECV 0 32k 73k
0 29 //1/bin/Net.ether509 20r RECV 0 24k 20k
0 33 //1//usr/ucb/Socklet 23r RECV 0 114k 200k
0 43 //1/
/usr/bin/syslogd 10o RECV 0 36k 32k
0 47 //1/*/usr/ucb/inetd 10o RECV 53 36k 36k
0 49 //1/bin/tinit 10o WAIT -1 16k 28k
0 50 /bin/ksh 10o WAIT -1 94k 28k
0 51 //1/bin/Fsys.eide 22r RECV 0 61k 114k
1 54 //1/bin/ksh 10o WAIT -1 23k 45k
0 61 //1/mer/Mer 19r RECV 0 81k 135k
0 84 //1/bin/Mqueue 10r RECV 0 20k 2727k
0 85 //1/mer/Merdispatch 10r RECV 0 12k 20k
0 87 //1/mer/Merudp 19r RECV 0 16k 28k
0 89 //1/mer/Mervpid 19r RECV 0 8192 20k
0 90 //1/mer/Merhvr 10r RECV 0 49k 36k
0 95 //1/mer/Merhvr 10r RECV 0 49k 40k
0 101 //1/mer/Merhvr 10r RECV 0 49k 36k
0 106 //1/mer/Mernmeafrm 10r RECV 0 12k 28k
0 108 //1/mer/Merapc 10r RECV 0 20k 20k
0 110 //1/mer/Mernmea 10r RECV 0 12k 24k
0 114 //1/mer/Mernmea 10r RECV 0 12k 24k
0 117 //1/mer/Mernmea 10r RECV 0 12k 28k
0 120 //1/mer/Mernmea 10r RECV 0 12k 24k
0 123 //1/mer/Mernmea 10r RECV 0 12k 24k
0 126 //1/mer/Mernmeasim 10r REPLY 0 16k 24k
0 127 //1/mer/Merframe2 13r REPLY 0 192k 16535k
2 138 //1/bin/ksh 10o REPLY 17 23k 45k
3 168 //1/bin/ksh 10o REPLY 17 23k 45k
3 970 //1/merc/merc 10o RECV 0 36k 32k
3 971 //1/merc/mercdispatch 10o RECV 0 8192 16k
3 972 //1/merc/merc 10o RECV 0 36k 40k
3 973 //1/merc/mercdispatch 10o SEM 2 8192 16k
3 975 //1/merc/mercpci3135 10o RECV 0 40k 45k
3 976 //1/merc/mercscl3135 10o RECV 0 36k 36k
3 977 //1/merc/mercwagotcp 10o RECV 0 57k 45k
3 978 //1/merc/mercwagotcp 10o RECV 0 57k 36k
3 979 //1/merc/mercwagotcp 10o RECV 0 57k 36k
3 980 //1/merc/mercpcl818l 10o RECV 0 36k 32k
3 981 //1/merc/mercfgr 10o RECV 0 32k 57k
1 1256 //1/bin/sin 10o REPLY 1 45k 57k





PROGRAM NAME VERSION DATE
sys/Proc32 Proc 4.25L Feb 15 2001
sys/Proc32 Slib16 4.23G Oct 04 1996
sys/Slib32 Slib32 4.24B Aug 12 1997
…/bin/Fsys Fsys32 4.24V Feb 18 2000
…/bin/Fsys DiskOnChip 5.00 Aug 24 2001
//1/bin/Dev32 Dev32 4.23G Oct 04 1996
//1/bin/Dev32.ser Dev32.ser 4.23I Jun 27 1997
//1/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//1/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//1/bin/Pipe Pipe 4.23A Feb 26 1996
//1/bin/Net Net 4.25C Aug 30 1999
//1/bin/Net.ether509 Net.ether509 4.24A Jun 26 1998
//1/*/usr/ucb/Socklet Socklet 4.25H Jul 30 1999
//1/bin/Fsys.eide eide 4.25A Feb 09 2000
//1/bin/Mqueue mqueue 4.24A Aug 30 1999





Node CPU Machine Speed Memory Ticksize Display
Flags
1 686/687 PCI 50264 33415k/66686k 10.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 22312 0 64 100 64 500 125 3 37M/
121M

Boot from Hard at Jan 16 11:42 Locators:





PCI version = 2.10

Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7190h,
PCI index = 0h
Class Code = 060000h Bridge (Host/PCI) ProgIF=0
Revision ID = 3h
Bus number = 0
Device number = 0
Function num = 0
Status Reg = 2210h
Command Reg = 6h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 20h
Cache Line Size= 0h
Base Address = MEM@d0000000h,Prefetchable,32bit length 67108864
Max Lat = 0ns
Min Gnt = 0ns
PCI Int Pin = 0
PCI Int Pin = NC
Interrupt line = 0

Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7191h,
PCI index = 0h
Class Code = 060400h Bridge (PCI/PCI) ProgIF=0
Revision ID = 3h
Bus number = 0
Device number = 1
Function num = 0
Status Reg = 220h
Command Reg = 107h
Header type = 1h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 40h
Cache Line Size= 0h
Primary Bus Number = 0h
Secondary Bus Number = 1h
Subordinate Bus Number = 1h
Secondary Latency Timer = 20h
I/O Base = a0h
I/O Limit = a0h
Secondary Status = 22a0h
Memory Base = d400h
Memory Limit = d6f0h
Prefetchable Memory Base = fff0h
Prefetchable Memory Limit= 0h
Prefetchable Base Upper 32 Bits = 0h
Prefetchable Limit Upper 32 Bits = 0h
I/O Base Upper 16 Bits = 0h
I/O Limit Upper 16 Bits = 0h
Bridge Control = 0ns
PCI Int Pin = 0
PCI Int Pin = NC
Interrupt line = 0


Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7110h,
PCI index = 0h
Class Code = 060100h Bridge (PCI/ISA) ProgIF=0
Revision ID = 2h
Bus number = 0
Device number = 7
Function num = 0
Status Reg = 280h
Command Reg = fh
Header type = 0h Multi-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 0h
Cache Line Size= 0h

Max Lat = 0ns
Min Gnt = 0ns
PCI Int Pin = 0
PCI Int Pin = NC
Interrupt line = 0

Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7111h,
PCI index = 0h
Class Code = 010180h Mass Storage (IDE) ProgIF=128
Revision ID = 1h
Bus number = 0
Device number = 7
Function num = 1
Status Reg = 280h
Command Reg = 5h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 20h
Cache Line Size= 0h
IO@f000h length 16 bytes
Max Lat = 0ns
Min Gnt = 0ns
PCI Int Pin = 0
PCI Int Pin = NC
Interrupt line = 0

Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7112h,
PCI index = 0h
Class Code = 0c0300h Serial Bus (Universal Serial Bus) ProgIF=0
Revision ID = 1h
Bus number = 0
Device number = 7
Function num = 2
Status Reg = 280h
Command Reg = 5h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 20h
Cache Line Size= 0h
IO@b000h length 32 bytes
Max Lat = 0ns
Min Gnt = 0ns
PCI Int Pin = 4
PCI Int Pin = INT D
Interrupt line = no connection

Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7113h,
PCI index = 0h
Class Code = 068000h Bridge (Other 128) ProgIF=0
Revision ID = 2h
Bus number = 0
Device number = 7
Function num = 3
Status Reg = 280h
Command Reg = 3h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 0h
Cache Line Size= 0h

Max Lat = 0ns
Min Gnt = 0ns
PCI Int Pin = 0
PCI Int Pin = NC
Interrupt line = 0

Vendor ID = 1165h,
Device ID = 40h,
PCI index = 0h
Class Code = 040000h Multimedia (Video) ProgIF=0
Revision ID = 0h
Bus number = 0
Device number = 17
Function num = 0
Status Reg = 0h
Command Reg = 7h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 20h
Cache Line Size= 0h
Base Address = MEM@d7801000h,32bit length 64 MEM@d7800000h,32bit length
256 MEM@d7000000h,32bit length 8388608
Max Lat = 0ns
Min Gnt = 0ns
PCI Int Pin = 1
PCI Int Pin = INT A
Interrupt line = 10

Vendor ID = 125dh,
Device ID = 1969h,
PCI index = 0h
Class Code = 040100h Multimedia (Audio) ProgIF=0
Revision ID = 2h
Bus number = 0
Device number = 18
Function num = 0
Status Reg = 290h
Command Reg = 5h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 20h
Cache Line Size= 0h
Base Address = IO@b400h length 64 bytes IO@b800h length 16 bytes
IO@bc00h length 16 bytes IO@c000h length 4 bytes IO@c400h length 4 bytes
Subsystem Vendor ID = 125dh
Subsystem ID = 8898h
Max Lat = 24ns
Min Gnt = 2ns
PCI Int Pin = 1
PCI Int Pin = INT A
Interrupt line = 11

Vendor ID = 1234h,
Device ID = 1616h,
PCI index = 0h
Class Code = 088000h System Peripherals (Other 128) ProgIF=0
Revision ID = 2h
Bus number = 0
Device number = 19
Function num = 0
Status Reg = 280h
Command Reg = 3h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 0h
Cache Line Size= 8h un-cacheable
Base Address = MEM@d7802000h,32bit length 128 IO@c800h length 128 bytes
IO@cc00h length 4 bytes
Subsystem Vendor ID = c1a2h
Subsystem ID = 823h
Max Lat = 0ns
Min Gnt = 0ns
PCI Int Pin = 0
PCI Int Pin = NC
Interrupt line = no connection

Vendor ID = 125dh,
Device ID = 1969h,
PCI index = 1h
Class Code = 040100h Multimedia (Audio) ProgIF=0
Revision ID = 2h
Bus number = 0
Device number = 20
Function num = 0
Status Reg = 290h
Command Reg = 5h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 20h
Cache Line Size= 0h
Base Address = IO@d000h length 64 bytes IO@d400h length 16 bytes
IO@d800h length 16 bytes IO@dc00h length 4 bytes IO@e000h length 4 bytes
Subsystem Vendor ID = 125dh
Subsystem ID = 8898h
Max Lat = 24ns
Min Gnt = 2ns
PCI Int Pin = 1
PCI Int Pin = INT A
Interrupt line = 15

Vendor ID = 1002h, ATI TECHNOLOGIES INC
Device ID = 474dh,
PCI index = 0h
Class Code = 030000h Display (VGA) ProgIF=0
Revision ID = 27h
Bus number = 1
Device number = 0
Function num = 0
Status Reg = 290h
Command Reg = 87h
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 20h
Cache Line Size= 8h un-cacheable
Base Address = MEM@d4000000h,32bit length 16777216 IO@a000h length 256
bytes MEM@d6000000h,32bit length 4096
Subsystem Vendor ID = 1002h
Subsystem ID = 8004h
Max Lat = 0ns
Min Gnt = 8ns
PCI Int Pin = 1
PCI Int Pin = INT A
Interrupt line = no connection

IRQ Routing = bus=0 slot=1 device=20
Vendor ID = 125dh,
Device ID = 1969h,
INTA=3,4,5,7,9,10,11,12,14,15 Slot2:INTD Slot3:INTC
Slot4:INTB Motherboard:INTB Motherboard:INTB
INTB=3,4,5,7,9,10,11,12,14,15 Slot3:INTD Slot4:INTC
Motherboard:INTC Motherboard:INTC
INTC=3,4,5,7,9,10,11,12,14,15 Slot2:INTB Slot3:INTA
Slot4:INTD Motherboard:INTD Motherboard:INTD
INTD=3,4,5,7,9,10,11,12,14,15 Slot2:INTC Slot3:INTB
Slot4:INTA Motherboard:INTA Motherboard:INTA

IRQ Routing = bus=0 slot=2 device=19
Vendor ID = 1234h,
Device ID = 1616h,
INTA=3,4,5,7,9,10,11,12,14,15
INTB=3,4,5,7,9,10,11,12,14,15 Slot1:INTC Slot3:INTA
Slot4:INTD Motherboard:INTD Motherboard:INTD
INTC=3,4,5,7,9,10,11,12,14,15 Slot1:INTD Slot3:INTB
Slot4:INTA Motherboard:INTA Motherboard:INTA
INTD=3,4,5,7,9,10,11,12,14,15 Slot1:INTA Slot3:INTC
Slot4:INTB Motherboard:INTB Motherboard:INTB

IRQ Routing = bus=0 slot=3 device=18
Vendor ID = 125dh,
Device ID = 1969h,
INTA=3,4,5,7,9,10,11,12,14,15 Slot1:INTC Slot2:INTB
Slot4:INTD Motherboard:INTD Motherboard:INTD
INTB=3,4,5,7,9,10,11,12,14,15 Slot1:INTD Slot2:INTC
Slot4:INTA Motherboard:INTA Motherboard:INTA
INTC=3,4,5,7,9,10,11,12,14,15 Slot1:INTA Slot2:INTD
Slot4:INTB Motherboard:INTB Motherboard:INTB
INTD=3,4,5,7,9,10,11,12,14,15 Slot1:INTB Slot4:INTC
Motherboard:INTC Motherboard:INTC

IRQ Routing = bus=0 slot=4 device=17
Vendor ID = 1165h,
Device ID = 40h,
INTA=3,4,5,7,9,10,11,12,14,15 Slot1:INTD Slot2:INTC
Slot3:INTB Motherboard:INTA Motherboard:INTA
INTB=3,4,5,7,9,10,11,12,14,15 Slot1:INTA Slot2:INTD
Slot3:INTC Motherboard:INTB Motherboard:INTB
INTC=3,4,5,7,9,10,11,12,14,15 Slot1:INTB Slot3:INTD
Motherboard:INTC Motherboard:INTC
INTD=3,4,5,7,9,10,11,12,14,15 Slot1:INTC Slot2:INTB
Slot3:INTA Motherboard:INTD Motherboard:INTD

IRQ Routing = bus=0 motherboard device=7 func=1
Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7110h,
Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7111h,
Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7112h,
Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7113h,
INTA=3,4,5,7,9,10,11,12,14,15 Slot1:INTD Slot2:INTC
Slot3:INTB Slot4:INTA Motherboard:INTA
INTB=3,4,5,7,9,10,11,12,14,15 Slot1:INTA Slot2:INTD
Slot3:INTC Slot4:INTB Motherboard:INTB
INTC=3,4,5,7,9,10,11,12,14,15 Slot1:INTB Slot3:INTD
Slot4:INTC Motherboard:INTC
INTD=3,4,5,7,9,10,11,12,14,15 Slot1:INTC Slot2:INTB
Slot3:INTA Slot4:INTD Motherboard:INTD

IRQ Routing = bus=0 motherboard device=1
Vendor ID = 8086h, INTEL CORPORATION
Device ID = 7191h,
INTA=3,4,5,7,9,10,11,12,14,15 Slot1:INTD Slot2:INTC
Slot3:INTB Slot4:INTA Motherboard:INTA
INTB=3,4,5,7,9,10,11,12,14,15 Slot1:INTA Slot2:INTD
Slot3:INTC Slot4:INTB Motherboard:INTB
INTC=3,4,5,7,9,10,11,12,14,15 Slot1:INTB Slot3:INTD
Slot4:INTC Motherboard:INTC
INTD=3,4,5,7,9,10,11,12,14,15 Slot1:INTC Slot2:INTB
Slot3:INTA Slot4:INTD Motherboard:INTD