How Recover From TCP/IP Stack Crash?

Using QNX 6.1, Large Stack (npm-tcpip.so)

We have had repeated cases of the TCP/IP stack crashing: things
are running fine, and then all network communications ends,
attempting to access a previously-usable socket results in
a socket no found-type error, and devices /dev/io-net/en0
and /dev/io-net/en1 are no longer present.

We’d like to detect this condition within the application, and
restart the stack without resetting the computer, if possible.

Any ideas what would be the proper sequence of slays and
process invocations to restart the stack should it crash?


Jeff Maass jmaass@columbus.rr.com Located near Columbus Ohio

When something goes wrong in any networking module it takes
all networking stuff down along with it. In fact it can be difficult
to tell if it’s the stack or the network driver that is crashing.

Check how io-net is started (and look for possible mount command)
and just do the same. You should have to slay anything but just to be
safe slay io-net (I’m sure it will be gone)

“Jeff Maass” <jmaass@columbus.rr.com> wrote in message
news:aeos3s$3t9$1@inn.qnx.com

Using QNX 6.1, Large Stack (npm-tcpip.so)

We have had repeated cases of the TCP/IP stack crashing: things
are running fine, and then all network communications ends,
attempting to access a previously-usable socket results in
a socket no found-type error, and devices /dev/io-net/en0
and /dev/io-net/en1 are no longer present.

We’d like to detect this condition within the application, and
restart the stack without resetting the computer, if possible.

Any ideas what would be the proper sequence of slays and
process invocations to restart the stack should it crash?


Jeff Maass > jmaass@columbus.rr.com > Located near Columbus Ohio

Jeff Maass <jmaass@columbus.rr.com> wrote:
| We have had repeated cases of the TCP/IP stack crashing: things
| are running fine, and then all network communications ends,
| attempting to access a previously-usable socket results in
| a socket no found-type error, and devices /dev/io-net/en0
| and /dev/io-net/en1 are no longer present.

I experienced this when I accidently mixed up things (I installed qnx
migration kit from 6.0). Due to change in libs, io-net used to crash
like crazy. Verify by using command

objdump -x nnn | grep NEEDED

where nnn == io-net, devn-your-driver.so, npm-tcpip.so
it should not give any files ending in .so.1 .

Also, see in /var/dumps… if you have the core then nothing like it!
Send it over to qssl. Hope someone fixes things.

| Any ideas what would be the proper sequence of slays and
| process invocations to restart the stack should it crash?

Three step process-

slay io-net (probably redundant… it’d be dead already)
io-net -d tulip -p tcpip (put your driver name instead of tulip)
netmanager

you can get name of your driver by using-

enum-devices -n | grep devn

HTH

Keep Smiling
Regards

  • mritunjai

http://www.me.iitb.ac.in/~mritun
http://mritun.qnx.org.ru
http://www.qnxzone.com/~mritun

Mritunjai,
You said it should not give any files ending in .so.i .Is it
true even if those files are mine? I mean i have some files which end with
…so.1.Could you elaborate on that please.

Sreekanth

“Mritunjai” <mritun@solitaire.hostel4.iitb.ac.in> wrote in message
news:aeqbp0$8jp$2@inn.qnx.com

Jeff Maass <> jmaass@columbus.rr.com> > wrote:
| We have had repeated cases of the TCP/IP stack crashing: things
| are running fine, and then all network communications ends,
| attempting to access a previously-usable socket results in
| a socket no found-type error, and devices /dev/io-net/en0
| and /dev/io-net/en1 are no longer present.

I experienced this when I accidently mixed up things (I installed qnx
migration kit from 6.0). Due to change in libs, io-net used to crash
like crazy. Verify by using command

objdump -x nnn | grep NEEDED

where nnn == io-net, devn-your-driver.so, npm-tcpip.so
it should not give any files ending in .so.1 .

Also, see in /var/dumps… if you have the core then nothing like it!
Send it over to qssl. Hope someone fixes things.

| Any ideas what would be the proper sequence of slays and
| process invocations to restart the stack should it crash?

Three step process-

slay io-net (probably redundant… it’d be dead already)
io-net -d tulip -p tcpip (put your driver name instead of tulip)
netmanager

you can get name of your driver by using-

enum-devices -n | grep devn

HTH

Keep Smiling
Regards

  • mritunjai

http://www.me.iitb.ac.in/~mritun
http://mritun.qnx.org.ru
http://www.qnxzone.com/~mritun

Sreekanth <sreekanth@cambira.com> wrote:
| You said it should not give any files ending in .so.i .Is it
| true even if those files are mine? I mean i have some files which end with
| .so.1.Could you elaborate on that please.

If its your file then it shouldn’t really matter. Actually files name
ending in .so.1 implies the application was compiled for qnx 6.0. qnx
6.1 introduced many new features including alignment changes in system
message structures… so as far as system resource managers are
concerned (including io-net and drivers) … 6.0 ones won’t run on 6.1/6.2
(atleast they won’t interoperate with applications using .so.2 version
libs)

It is advisable to recompile your apps for whatever version you’re
using… just in case!


Keep Smiling
Regards

  • mritunjai

http://www.me.iitb.ac.in/~mritun
http://mritun.qnx.org.ru
http://www.qnxzone.com/~mritun

“Jeff Maass” <jmaass@columbus.rr.com> wrote in message
news:aeos3s$3t9$1@inn.qnx.com

Using QNX 6.1, Large Stack (npm-tcpip.so)

We have had repeated cases of the TCP/IP stack crashing: things
are running fine, and then all network communications ends,
attempting to access a previously-usable socket results in

I think it was in q.p.q.porting, where was suggested to increase the

io-net’s stacksize.

I read the article yesterday, it’s fairly recent, but I do not have the
means to find it now.

Regards,

Leon.

I’ve been talking with QNX Tech Support as well, and find that
increasing the stack size is not as easy in QNX 6.1 as it is
in QNX 6.2. The “stacksize=” entry parameter was added in
the 6.2 version of the stack!

No, I cannot upgrade to QNX 6.2 at this point, as we are
pressed for time before a product release, and can’t take
the risk (yet).

Were there problems in QNX 6.1 with the 2912-byte stack?
All the reporters of the problem so far have been running
QNX 6.2.


Jeff Maass jmaass@columbus.rr.com Located near Columbus Ohio
USPSA # L-1192 NROI/CRO Amateur Radio K8ND
Maass’ IPSC Resources Page: http://home.columbus.rr.com/jmaass


“Leon Woestenberg” <leon.woestenberg@gmx.net> wrote in message
news:aeuutv$o4p$1@inn.qnx.com

“Jeff Maass” <> jmaass@columbus.rr.com> > wrote in message
news:aeos3s$3t9$> 1@inn.qnx.com> …
Using QNX 6.1, Large Stack (npm-tcpip.so)

We have had repeated cases of the TCP/IP stack crashing: things
are running fine, and then all network communications ends,
attempting to access a previously-usable socket results in

I think it was in q.p.q.porting, where was suggested to increase the
io-net’s stacksize.

I read the article yesterday, it’s fairly recent, but I do not have the
means to find it now.

Regards,

Leon.

OK, I am not sure what solution you are looking for.

You want to find out the cause of the crash, and fix it,
without upgrade to 6.2, in a dead line?

Or, your orignal post seems saying you don’t care it is
crashed, as long as you can detect and restart it ?

To “detect” and “restart” io-net crash is easy. You program
calling socket() hit with an ENOENT actually is a good sign
of “io-net is crashed”.

You then only need to spawn() “/sbin/io-net -d
-p tcpip”, and spawn() “/sbin/netmanager”. And the network
is comming back.

If you have only one application use tcpip, this looks easy.
However, stop/restart network, means ALL your network related
application need to be aware, and need to retry after a
failure. Any system utilities (inetd/syslogd ?) will need to
restart.

So if you have a lot of applications need to restart, you can
do this:

Write a watch dog application, which does these:

  1. ChannelCreate() a channel with _NTO_CHF_COID_DISCONNECT
  2. open a socket (sd = socket())
  3. MsgReceive() on the channel created in step 1)

If io-net crashed, there will be a pulse with code
_NTO_CODE_COIDDEATH, value of the sd comming in.

You watch dog program then spawn() “slay” to kill a list
of applicaitons, restart io-net, restart all the applications
on the list.

-xtang

Jeff Maass <jmaass@columbus.rr.com> wrote:

I’ve been talking with QNX Tech Support as well, and find that
increasing the stack size is not as easy in QNX 6.1 as it is
in QNX 6.2. The “stacksize=” entry parameter was added in
the 6.2 version of the stack!

No, I cannot upgrade to QNX 6.2 at this point, as we are
pressed for time before a product release, and can’t take
the risk (yet).

Were there problems in QNX 6.1 with the 2912-byte stack?
All the reporters of the problem so far have been running
QNX 6.2.


Jeff Maass > jmaass@columbus.rr.com > Located near Columbus Ohio
USPSA # L-1192 NROI/CRO Amateur Radio K8ND
Maass’ IPSC Resources Page: > http://home.columbus.rr.com/jmaass



“Leon Woestenberg” <> leon.woestenberg@gmx.net> > wrote in message
news:aeuutv$o4p$> 1@inn.qnx.com> …

“Jeff Maass” <> jmaass@columbus.rr.com> > wrote in message
news:aeos3s$3t9$> 1@inn.qnx.com> …
Using QNX 6.1, Large Stack (npm-tcpip.so)

We have had repeated cases of the TCP/IP stack crashing: things
are running fine, and then all network communications ends,
attempting to access a previously-usable socket results in

I think it was in q.p.q.porting, where was suggested to increase the
io-net’s stacksize.

I read the article yesterday, it’s fairly recent, but I do not have the
means to find it now.

Regards,

Leon.