io-net core dump

I have a reproducible io-net problem. I have attached the most recent coredump, which happened on 2 occasions during our testing.

Hardware:

I have attached a “pci -v” output, and a “pidin” of all QNX processes at the time of the core. I have also included a io-net
checksum as well as a devn-speedo.so and a npm-tcpip.so checksum.

Is it possible that there are mismatched versions of io-net and devn-speedo.so/npm-tcpip.so ?

I did upgrade from 6.2.1A to 6.2.1B. The problem was easily reproducible with 6.2.1A (disconnecting a cable during a flood ping
would cause the problem), but has become more difficult (but not as far from impossible as we need) .

Any and all help is appreciated, as this has just been found in QA, and is holding up our first beta.

Rennie

It looks like a buffer management issue as it’s dying in a free
routine. This doesn’t ring a bell and I’ve been unable to reproduce it
using the same versions of io-net npm-tcpip-v6.so and devn-speedo.so.

Any other steps to reproduce it? Does it happen without qnet loaded?

-seanb

Rennie Allen <rallen@csical.com> wrote:

This is a multi-part message in MIME format.
--------------070901080804040603020005
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

I have a reproducible io-net problem. I have attached the most recent coredump, which happened on 2 occasions during our testing.

Hardware:

I have attached a “pci -v” output, and a “pidin” of all QNX processes at the time of the core. I have also included a io-net
checksum as well as a devn-speedo.so and a npm-tcpip.so checksum.

Is it possible that there are mismatched versions of io-net and devn-speedo.so/npm-tcpip.so ?

I did upgrade from 6.2.1A to 6.2.1B. The problem was easily reproducible with 6.2.1A (disconnecting a cable during a flood ping
would cause the problem), but has become more difficult (but not as far from impossible as we need) .

Any and all help is appreciated, as this has just been found in QA, and is holding up our first beta.

Rennie

We previously reported repeatable io-net crashes. For us,
they occured when QNX native networking was bridged over a wireless
LAN. This caused the native networking code to exercise paths
it doesn’t normally use, because the LAN had appreciable
delays. This in turn caused a crash related to buffer
management in io-net.

This was reported to QSSL in 2003.

Are you encountering this with native networking?

John Nagle
Team Overbot

Rennie Allen wrote:

I have a reproducible io-net problem. I have attached the most recent
coredump, which happened on 2 occasions during our testing.

Hardware:

I have attached a “pci -v” output, and a “pidin” of all QNX processes at
the time of the core. I have also included a io-net
checksum as well as a devn-speedo.so and a npm-tcpip.so checksum.

Is it possible that there are mismatched versions of io-net and
devn-speedo.so/npm-tcpip.so ?

I did upgrade from 6.2.1A to 6.2.1B. The problem was easily
reproducible with 6.2.1A (disconnecting a cable during a flood ping
would cause the problem), but has become more difficult (but not as far
from impossible as we need) .

Any and all help is appreciated, as this has just been found in QA, and
is holding up our first beta.

Rennie

Sean Boudreau wrote:

It looks like a buffer management issue as it’s dying in a free
routine. This doesn’t ring a bell and I’ve been unable to reproduce it
using the same versions of io-net npm-tcpip-v6.so and devn-speedo.so.

Any other steps to reproduce it?

Yes. If you create an application with a socket leak you should be able
to speed it along. We had a socket leak in our app and this “assisted”
in producing the error. Also simply try to create as much load as you
can with 4 NICS. On 6.2.1A we could reproduce the crash simply with flood
pings. On 6.2.1B we have not been able to do this.

Our setup looks like this:

±-------+
| |------+
| |----+ |
±-------+ | |
a| |b c| |d
±-------+ | |
| |----+ |
| |------+
±-------+ | |
| |

Where:

a = xover cable network 192.168.1
b = xover cable network 192.168.2
c = shared media network A
d = shared media network B

Does it happen without qnet loaded?

Since Qnet is integral to the application, we can’t run it without Qnet
loaded. Qnet is used on lans a & b and is bound to ip.

John Nagle wrote:

Are you encountering this with native networking?

A combination of native networking over ip and socket networking.

Rennie