alive takes too long

Santosh_Patil · October 27, 2000, 3:04pm

We have a QNX 4.24 network running with 3 nodes and one WinNT machine on
the network too. All the QNX boxes have an NE2000 compatible network
card in them. So they all load Net.ether1000. The network is all fine
but I have a vague problem when I try the ‘alive’ utility on node 1.
Occasionally it takes almost over 10 seconds to respond!

netinfo on node 1 revealed a lot of these:

…
02:05:23 0 Status 4 ( 35) NET logical node not in node map
02:05:23 0 Status 4 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 5 ( 35) NET logical node not in node map
02:05:23 0 Status 5 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 6 ( 35) NET logical node not in node map
02:05:23 0 Status 6 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 7 ( 35) NET logical node not in node map
02:05:23 0 Status 7 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
…

netmap reveals this:

Logical Lan Physical TX Count Last TX Fail Time

1 1 00C026 DD1BC5 ; 0
2 1 004005 70A5DE ; 14392
3 1 004005 6FFDA3 ; 21873

Note that the WinNT machine NIC is not listed here. We thought its not
necessary since its talked to only over TCP/IP and not through FLEET.

Thanks in advance,
-Santosh Patil,
GE Harris

Mario_Charest1 · October 27, 2000, 5:27pm

“Santosh Patil” <sans@ieee.org> wrote in message
news:39F9996F.FAA251E@ieee.org…

We have a QNX 4.24 network running with 3 nodes and one WinNT machine on
the network too. All the QNX boxes have an NE2000 compatible network
card in them. So they all load Net.ether1000. The network is all fine
but I have a vague problem when I try the ‘alive’ utility on node 1.
Occasionally it takes almost over 10 seconds to respond!

netinfo on node 1 revealed a lot of these:

…
02:05:23 0 Status 4 ( 35) NET logical node not in node map
02:05:23 0 Status 4 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 5 ( 35) NET logical node not in node map
02:05:23 0 Status 5 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 6 ( 35) NET logical node not in node map
02:05:23 0 Status 6 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 7 ( 35) NET logical node not in node map
02:05:23 0 Status 7 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
…

Tthat just means you have more then three licenses

in the machine. The nameloc utility pools every possible
machine, that according to the number of licenses
in the machine. So nameloc is just telling you that it want’s
to pool a node number but that node is not defined in the netmap

Removing the extra licenses will improved alive response time,
but it could still happen, mostly if one if the node is down.

netmap reveals this:

Logical Lan Physical TX Count Last TX Fail Time

1 1 00C026 DD1BC5 ; 0
2 1 004005 70A5DE ; 14392
3 1 004005 6FFDA3 ; 21873

Note that the WinNT machine NIC is not listed here. We thought its not
necessary since its talked to only over TCP/IP and not through FLEET.

This is FLEET related so it will only display QNX4 machines.

Thanks in advance,
-Santosh Patil,
GE Harris

Mike_Taillon · October 27, 2000, 6:05pm

Mario Charest <mcz@videotron.ca> wrote:

“Santosh Patil” <> sans@ieee.org> > wrote in message
news:> 39F9996F.FAA251E@ieee.org> …
We have a QNX 4.24 network running with 3 nodes and one WinNT machine on
the network too. All the QNX boxes have an NE2000 compatible network
card in them. So they all load Net.ether1000. The network is all fine
but I have a vague problem when I try the ‘alive’ utility on node 1.
Occasionally it takes almost over 10 seconds to respond!

netinfo on node 1 revealed a lot of these:

…
02:05:23 0 Status 4 ( 35) NET logical node not in node map
02:05:23 0 Status 4 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 5 ( 35) NET logical node not in node map
02:05:23 0 Status 5 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 6 ( 35) NET logical node not in node map
02:05:23 0 Status 6 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
02:05:23 0 Status 7 ( 35) NET logical node not in node map
02:05:23 0 Status 7 ( 7) NET ( tx) failed (vc_attach ctrl pkt)
…

Tthat just means you have more then three licenses
in the machine. The nameloc utility pools every possible
machine, that according to the number of licenses
in the machine. So nameloc is just telling you that it want’s
to pool a node number but that node is not defined in the netmap

Removing the extra licenses will improved alive response time,
but it could still happen, mostly if one if the node is down.

i don’t think removing licenses will help.
(btw, an alternative would be to use the -e option to nameloc.
eg. nameloc -e3 )

by not having a netmap entry for a node, that node cannot be polled,
as there is basically no known route to it. ie. its not the driver
timing out trying to reach a down node, as Net doesn’t have a clue as
to which driver to route through, let alone the MAC addr to send to…

to get a better understanding of the problem, we would need the complete
output of netinfo and netinfo -l from each machine, just after the problem
occurred. in addition, the output from traceinfo from each node may
also shed a clue.

also, which nodes run nameloc ?

Paul_Russell · October 30, 2000, 5:19pm

Tthat just means you have more then three licenses
in the machine. The nameloc utility pools every possible
machine, that according to the number of licenses
in the machine. So nameloc is just telling you that it want’s
to pool a node number but that node is not defined in the netmap

Removing the extra licenses will improved alive response time,
but it could still happen, mostly if one if the node is down.

I’ve seen similar things when either I have licenses for machines that
aren’t there (Either they’re disconnected or I typed in the MAC wrong in the
netmap file).
I’ve also seen that the Fleet networking periodically becomes slow when this
happens. I guess the Fleet re-checks for the other nodes every now and then,
thus introducing other slow downs or timeouts - or some other software I’m
using is causing the re-check…
-Paul