QNET and "NET_REPLY" state

Hi,

I am using QNET quite a lot, and it is working quite well BTW. I like the
features it has over QNX 4 Fleet. Especially being able to use names
instead of numbers. This is working out very well.

Just a few tid bits that are a bit anoying and troublesome. Today I was
doing a lot of starting various resource managers across the net using both
on-f and on -n , and it works great…but every once and a while, “on” would
not go away (get stuck in a “NET_REPLY” state and thus I wouldn’t get the
shell prompt back after a remotely spawned program would try to exit.

I looked later at my list of processes and saw a large number of ‘on’ just
hanging around, unable to exit as there were all in this "NET_REPLY state.

I tried killing io-net and then restart it, but alas, that operation froze
the system. I didn’t have a high poriority shell running so I couldn’t do
any repairs and had to reboot.

I’m worried about mainly one thing…if I have a process doing IPC it looks
like it is possible for the other end to reply and the process being replied
to to not get the message and continue to wait. Additionally…there
doesn’t seem to be a way to kill it (seems to be hung on io-net?). What can
I do to guarantee that my processes won’t get stuck in this state?

Thanks,

Kevin

Kevin Stallard <kevin@ffflyingrobots.com> wrote:

Hi,

I am using QNET quite a lot, and it is working quite well BTW. I like the
features it has over QNX 4 Fleet. Especially being able to use names
instead of numbers. This is working out very well.

Just a few tid bits that are a bit anoying and troublesome. Today I was
doing a lot of starting various resource managers across the net using both
on-f and on -n , and it works great…but every once and a while, “on” would
not go away (get stuck in a “NET_REPLY” state and thus I wouldn’t get the
shell prompt back after a remotely spawned program would try to exit.

I looked later at my list of processes and saw a large number of ‘on’ just
hanging around, unable to exit as there were all in this "NET_REPLY state.

I tried killing io-net and then restart it, but alas, that operation froze
the system. I didn’t have a high poriority shell running so I couldn’t do
any repairs and had to reboot.

I’m worried about mainly one thing…if I have a process doing IPC it looks
like it is possible for the other end to reply and the process being replied
to to not get the message and continue to wait. Additionally…there
doesn’t seem to be a way to kill it (seems to be hung on io-net?). What can
I do to guarantee that my processes won’t get stuck in this state?

By help us find the bug in QNET and fix it :slight_smile:

In whatever satuation, a node goes away, the server application you
are talking with crashed …, you should NEVER have an application
blocked forever.

The “only way” a “block forever” could happen is if you reply blocked on
remote server, it do not MsgReply() you, nor does it handle
the “unblock()” request you send to him (by try to kill your
client).

The “NET_REPLY” states, indicate the application is sending
a Pulse or Signal crossing network, and haven’t got reply
yet.

If you can find a reproducable case, that will be great,
post the pidin output as well as “cat /proc/qnetstats” after
blocked might help too.

I will try to see if I could reproduce it tomorrow.
I assume all these are on 6.2 right?

-xtang

Hey xtang,

Yeah…6.2…when I see these again, I’ll post this stuff. It takes a fair
amount of starting/slaying before it occurs and I haven’t done a lot of
testing in the past couple of days…but I’ll make sure I hammer on this
thing early next week.

Thanks,

Kevin

“Xiaodan Tang” <xtang@qnx.com> wrote in message
news:agisov$i5l$1@nntp.qnx.com

Kevin Stallard <> kevin@ffflyingrobots.com> > wrote:
Hi,

I am using QNET quite a lot, and it is working quite well BTW. I like
the
features it has over QNX 4 Fleet. Especially being able to use names
instead of numbers. This is working out very well.

Just a few tid bits that are a bit anoying and troublesome. Today I was
doing a lot of starting various resource managers across the net using
both
on-f and on -n , and it works great…but every once and a while, “on”
would
not go away (get stuck in a “NET_REPLY” state and thus I wouldn’t get
the
shell prompt back after a remotely spawned program would try to exit.

I looked later at my list of processes and saw a large number of ‘on’
just
hanging around, unable to exit as there were all in this "NET_REPLY
state.

I tried killing io-net and then restart it, but alas, that operation
froze
the system. I didn’t have a high poriority shell running so I couldn’t
do
any repairs and had to reboot.

I’m worried about mainly one thing…if I have a process doing IPC it
looks
like it is possible for the other end to reply and the process being
replied
to to not get the message and continue to wait. Additionally…there
doesn’t seem to be a way to kill it (seems to be hung on io-net?). What
can
I do to guarantee that my processes won’t get stuck in this state?

By help us find the bug in QNET and fix it > :slight_smile:

In whatever satuation, a node goes away, the server application you
are talking with crashed …, you should NEVER have an application
blocked forever.

The “only way” a “block forever” could happen is if you reply blocked on
remote server, it do not MsgReply() you, nor does it handle
the “unblock()” request you send to him (by try to kill your
client).

The “NET_REPLY” states, indicate the application is sending
a Pulse or Signal crossing network, and haven’t got reply
yet.

If you can find a reproducable case, that will be great,
post the pidin output as well as “cat /proc/qnetstats” after
blocked might help too.

I will try to see if I could reproduce it tomorrow.
I assume all these are on 6.2 right?

-xtang