OS crash on 4.25E.

Hello. My company makes driving simulator apps using QNX. In the
process of testing, we keep getting OS crashes during network scenarios
(two or more driving simulators networked together for cooperative
training). The app is very stable for single simulator scenarios, but
will consistently crash as soon as a network element is added.

The only difference between network and non-network scenarios is the
amount of data sent over the network (100 Mbit ethernet, Corman cards
and drivers, 3Com hubs). There is always a certain amount of data
(largely status information going back to the instructor station), but
each additional network participant will add a set amount of data. This
data is, roughly speaking, 4 KB per participant per frame (a frame is
1/30 of a second). When the networking works, there is no appreciable
slowdown; the ethernet seems more than capable of handling the traffic
of up to 4 participants (the most we’ve tested with). We make extensive
use of UDP multicasting to broadcast data to multiple recipients.

Interestly, the more network participants, the sooner the crashes occur
on average. For example, a 2-player network will generally last upwards
of 10 minutes before a crash, while a 4-player network often causes a
crash within a few minutes. This leads me to believe that the crash is
somehow related to the amount of network traffic.

Without further ado, here’s the crash screen:


Version: 425.L Feb 15 2001 Technical Support: +1 (613) 591-0941
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cs:eip=5:89c1 ss:esp=d:f7c0f50 efl=12097 ds=d es=d fs=0 gs=0
eax/44b2 ebx/3c73c7a2 ecx/44fb7 edx/1 esi/0 edi/1 ebp/f7c0f5c
Stack (d:f7c0f50)
1aa18903 00044f67 00006001 0f7c0fa0 00003820 0000e488 00008a48 00003820
0000e488 00008000 1aa18903 00008c04 0002f0b0 00000000 00000000 00000000
3c73c7a2 00000001 00000001 00000001 0f7c0fb8 00000000 0000597d 0002f0b0
00008fd5 0000001c 0f7c0fe8 0000597d 0000000b 00003820 0f7c0fd0 000057aa
Process Entry (addr 5aa4)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001a1a
00005858 0100000d 00005b5c ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00006401 000d0005 00008218 00000000 00000001
00000022 00000000 00000161 000102c0 00000000 00000000 00032c80 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000


Here’s the output of ‘sin ver’:


PROGRAM NAME VERSION DATE
sys/Proc32 Proc 4.25L Feb 15 2001
sys/Proc32 Slib16 4.23G Oct 04 1996
sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24V Feb 18 2000
/bin/Fsys Floppy 4.24B Aug 19 1997
/bin/Fsys.eide eide 4.25A Feb 09 2000
//1/bin/Dev32 Dev32 4.23G Oct 04 1996
//1/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//1/bin/Dev32.ser Dev32.ser 4.23I Jun 27 1997
//1/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//1/bin/Mouse Mouse 4.24A Aug 22 1997
//1/bin/Iso9660fsys Iso9660fsys 4.23D Mar 20 2000
//1/bin/Pipe Pipe 4.23A Feb 26 1996
//1/bin/Net Net 4.25C Aug 30 1999
//1/bin/Net.ct100tx Net.ct100tx 4.25F Aug 20 2001
//1/bin/Net.ct100tx Net.ct100tx 4.25F Aug 20 2001
//1/bin/Net.ct100tx Net.ct100tx 4.25F Aug 20 2001
//1/*/5.0/usr/ucb/Tcpip Tcpip 5.00A Jan 26 2001
//1/bin/Audio.423 Audio.423 4.23A Apr 17 1997
//1/bin/Audio.423 Audio.423 4.23A Apr 17 1997
//1/bin/cron cron 4.23B Oct 30 1997
//1/bin/SMBfsys SMBfsys 1.30I Dec 07 1999


If anyone at QSSL would comment on likely culprits, it would be greatly
appreciated. I can post more information as necessary. Thanks.

Josh Hamacher
FAAC Incorporated

Josh Hamacher <hamacher@faac.com> wrote in news:3C73D113.5030100@faac.com:

Without further ado, here’s the crash screen:


Version: 425.L Feb 15 2001 Technical Support: +1 (613) 591-0941
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cs:eip=5:89c1 ss:esp=d:f7c0f50 efl=12097 ds=d es=d fs=0 gs=0
eax/44b2 ebx/3c73c7a2 ecx/44fb7 edx/1 esi/0 edi/1 ebp/f7c0f5c
Stack (d:f7c0f50)
1aa18903 00044f67 00006001 0f7c0fa0 00003820 0000e488 00008a48 00003820
0000e488 00008000 1aa18903 00008c04 0002f0b0 00000000 00000000 00000000
3c73c7a2 00000001 00000001 00000001 0f7c0fb8 00000000 0000597d 0002f0b0
00008fd5 0000001c 0f7c0fe8 0000597d 0000000b 00003820 0f7c0fd0 000057aa
Process Entry (addr 5aa4)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001a1a
00005858 0100000d 00005b5c ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00006401 000d0005 00008218 00000000 00000001
00000022 00000000 00000161 000102c0 00000000 00000000 00032c80 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000

This looks like it’s crashing in the timer code for proc due to a page
fault. There where a few timer related bugs found, and fixed in a newer
version of Proc. You should contact your sales rep. to see if you can
organize getting a copy of the newest Proc (version N, unreleased).


\

Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Okay, thanks. I really appreciate the response - we’ve been fighting
with networking for several months, and this is (hopefully) the last
problem.

Josh Hamacher
FAAC Incorporated


Adam Mallory wrote:

Josh Hamacher <> hamacher@faac.com> > wrote in news:> 3C73D113.5030100@faac.com> :


Without further ado, here’s the crash screen:


Version: 425.L Feb 15 2001 Technical Support: +1 (613) 591-0941
Proc fault 1, ldt 100 sys/Proc32; fault e+0
cs:eip=5:89c1 ss:esp=d:f7c0f50 efl=12097 ds=d es=d fs=0 gs=0
eax/44b2 ebx/3c73c7a2 ecx/44fb7 edx/1 esi/0 edi/1 ebp/f7c0f5c
Stack (d:f7c0f50)
1aa18903 00044f67 00006001 0f7c0fa0 00003820 0000e488 00008a48 00003820
0000e488 00008000 1aa18903 00008c04 0002f0b0 00000000 00000000 00000000
3c73c7a2 00000001 00000001 00000001 0f7c0fb8 00000000 0000597d 0002f0b0
00008fd5 0000001c 0f7c0fe8 0000597d 0000000b 00003820 0f7c0fd0 000057aa
Process Entry (addr 5aa4)
00000000 00000001 00000000 00000001 00000000 00000000 30020207 00001a1a
00005858 0100000d 00005b5c ffffffff 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00006401 000d0005 00008218 00000000 00000001
00000022 00000000 00000161 000102c0 00000000 00000000 00032c80 00000000
00000000 00000000 00000000 ffff0001 00000000 00000000 00000000


This looks like it’s crashing in the timer code for proc due to a page
fault. There where a few timer related bugs found, and fixed in a newer
version of Proc. You should contact your sales rep. to see if you can
organize getting a copy of the newest Proc (version N, unreleased).
\