Doug Rixmann <rixmannd@rdsdata.com> wrote:
That’s what happens… some additional information:
- nameloc is running on all devices (6 devices)
- if I don’t start nameloc on a device, it complains about licensing. Can I
start it to get the device up and then kill the nameloc process on all
devices but the 2 servers?
Use nameloc -k on the non-server nodes to make them aware of the
license information from the server nodes.
- netpoll is running on all devices (within our application) as
netpoll -i1 -p1 -r1
- not sure of the value of this
This is probably the source of your problem with the hub going,
and everything going.
With this command to netpoll, you’ve said (basically) “I’ve got a
perfect netowrk, treat the slightest failure as a real failure.”
There is, basically, a tradeoff between ability to handle/ignore
transient failures, and the ability to quickly detect real failures.
netpoll basically controls how long before you look for a failure,
how often you retry, and how long between retries. You’ve taken
ALL of these options to a minimum value, basically saying that give
up (tear down the connection, ie the vc) after the slightest failure.
The default values are: netpoll -i10 -p10 -r6 – which will take about
10 minutes to finally give up on another node.
You’re numbers will give up on another node after about 2 seconds.
You may want to choose some values in between.
There are other parameters that can affect network resiliency, some
in the command line to Net, some in the driver command line.
On the how long before fail:
Net.driver:
-n tx_num_retries max number of tx retries after timeout (default 3)
-t tx_retry_ticks number of 50ms ticks between tx retries (default 20)
And on the recover side:
Net:
-t tx_fail_time time in seconds before retry failed network for node (40)
Net.driver:
-f tx_forget_time seconds until rxd nack is forgotten about for txing
-David
QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.