I have a reproducible io-net problem. I have attached the most recent coredump, which happened on 2 occasions during our testing.
Hardware:
I have attached a “pci -v” output, and a “pidin” of all QNX processes at the time of the core. I have also included a io-net
checksum as well as a devn-speedo.so and a npm-tcpip.so checksum.
Is it possible that there are mismatched versions of io-net and devn-speedo.so/npm-tcpip.so ?
I did upgrade from 6.2.1A to 6.2.1B. The problem was easily reproducible with 6.2.1A (disconnecting a cable during a flood ping
would cause the problem), but has become more difficult (but not as far from impossible as we need) .
Any and all help is appreciated, as this has just been found in QA, and is holding up our first beta.
Rennie
It looks like a buffer management issue as it’s dying in a free
routine. This doesn’t ring a bell and I’ve been unable to reproduce it
using the same versions of io-net npm-tcpip-v6.so and devn-speedo.so.
Any other steps to reproduce it? Does it happen without qnet loaded?
-seanb
Rennie Allen <rallen@csical.com> wrote:
This is a multi-part message in MIME format.
--------------070901080804040603020005
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
I have a reproducible io-net problem. I have attached the most recent coredump, which happened on 2 occasions during our testing.
Hardware:
I have attached a “pci -v” output, and a “pidin” of all QNX processes at the time of the core. I have also included a io-net
checksum as well as a devn-speedo.so and a npm-tcpip.so checksum.
Is it possible that there are mismatched versions of io-net and devn-speedo.so/npm-tcpip.so ?
I did upgrade from 6.2.1A to 6.2.1B. The problem was easily reproducible with 6.2.1A (disconnecting a cable during a flood ping
would cause the problem), but has become more difficult (but not as far from impossible as we need) .
Any and all help is appreciated, as this has just been found in QA, and is holding up our first beta.
Rennie
We previously reported repeatable io-net crashes. For us,
they occured when QNX native networking was bridged over a wireless
LAN. This caused the native networking code to exercise paths
it doesn’t normally use, because the LAN had appreciable
delays. This in turn caused a crash related to buffer
management in io-net.
This was reported to QSSL in 2003.
Are you encountering this with native networking?
John Nagle
Team Overbot
Rennie Allen wrote:
I have a reproducible io-net problem. I have attached the most recent
coredump, which happened on 2 occasions during our testing.
Hardware:
I have attached a “pci -v” output, and a “pidin” of all QNX processes at
the time of the core. I have also included a io-net
checksum as well as a devn-speedo.so and a npm-tcpip.so checksum.
Is it possible that there are mismatched versions of io-net and
devn-speedo.so/npm-tcpip.so ?
I did upgrade from 6.2.1A to 6.2.1B. The problem was easily
reproducible with 6.2.1A (disconnecting a cable during a flood ping
would cause the problem), but has become more difficult (but not as far
from impossible as we need) .
Any and all help is appreciated, as this has just been found in QA, and
is holding up our first beta.
Rennie
Sean Boudreau wrote:
It looks like a buffer management issue as it’s dying in a free
routine. This doesn’t ring a bell and I’ve been unable to reproduce it
using the same versions of io-net npm-tcpip-v6.so and devn-speedo.so.
Any other steps to reproduce it?
Yes. If you create an application with a socket leak you should be able
to speed it along. We had a socket leak in our app and this “assisted”
in producing the error. Also simply try to create as much load as you
can with 4 NICS. On 6.2.1A we could reproduce the crash simply with flood
pings. On 6.2.1B we have not been able to do this.
Our setup looks like this:
±-------+
| |------+
| |----+ |
±-------+ | |
a| |b c| |d
±-------+ | |
| |----+ |
| |------+
±-------+ | |
| |
Where:
a = xover cable network 192.168.1
b = xover cable network 192.168.2
c = shared media network A
d = shared media network B
Does it happen without qnet loaded?
Since Qnet is integral to the application, we can’t run it without Qnet
loaded. Qnet is used on lans a & b and is bound to ip.
John Nagle wrote:
Are you encountering this with native networking?
A combination of native networking over ip and socket networking.
Rennie