Hi Richard,
I assume that you are talking about a fully redundant network. I.E. Several
CPUs each connected to two different LANs. If my assumption is not correct
then everything that I am about to say does not apply.
I worked for 6 years in broading. We sent audio over the ethernet. We were
able to start an audio file playing on a diskless workstation. I.E. the DSP
was on a diskless CPU and the software was reading the audio file across the
network. When we used dual LANs (customers choice, based on cost), we were
able to unplug an ethernet and the audio would just keep right on playing
without a single hicup. We could then plug in that ethernet and unplug the
other ethernet and the audio would still keep right on playing. you could
toggle back and forth with no problem. Now here’s the good part. You could
unplug BOTH ethernets and the audio would still keep playing. The trick was
twofold.
We used a three second buffer. So one of the ethernets had to be plugged in
within 3 seconds. But the more important trick was that the Net.ether*
driver had to be changed.
The -t (number of 50 ms periods before a timeout) (default = 20) and the -n
(number of retries) (default = 3) were way too great. We used -t2 -n1.
Here’s what happens under the hood.
Some program sends a mesage to Net to send to some other CPU. Net decides
what paths are available to get that message from the local node to the
other node. It then passes the mesage to one of the Net.ether* drivers.
That driver begins trying to send the packet. After the -t (N * 50 ms)
timeout it will retry for -n N retries. Not until all of this fails will
that driver go back to Net and say “I can’t get this packet out”. If Net
has another alternate path, it will try that one. (Repeat the above
procedure) Not until all paths have failed will Net claim that it can’t
transmit the packet.
The default value are good and necessary for single LAN models (which is the
majority of the cases). but with redundant LANs it just takes too damn long
to time out and try the alternate path.
So, what are the consequences? If the -t value is too small the driver may
think that it can’t get the packet out when in fact it was successful, it
just didn’t get an ACK back yet. Or it may give up on a given LAN
prematurly. Obviously, for this to work, you need very clean LANs.
Otherwise you will get network failures all over the place.
BTW, if the receiving QNX node receives the same packet more than once, it
does handle it gracefully. It will log an error and do the right thing.
One more point. The biggest villian of ethernets are collisions. We found
that if you have a file server and a bunch of workstations we could greatly
reduce collisions by lying to the ether net drivers. The trick was that we
wanted all traffic going from the file server on one LAN and all of the
traffic going to the server on the other LAN. The drivers have a -r
MediaRate option. Net will choose the faster Net.* driver based on the
media rate BUT ONLY if one is 10 times faster than the other. So, on the
file server the Net drivers are loaded as follows:
Net.etherwhatever -l 1 -r 1000000 &
Net.etherwhatever -l 2 -r 100000 &
And on each of the workstations it is reversed as:
Net.etherwhatever -l 1 -r 100000 &
Net.etherwhatever -l 2 -r 1000000 &
Otherwise, almost all of your traffic will go over the first LAN in both
directions as indicated by the ‘netinfo -l’ statistics.
Hope some of this helps.
Brown, Richard <brownr@aecl.ca> wrote in message
news:9084sk$s84$1@inn.qnx.com…
From the use message this is my understanding of how the -t option works:
if a node determines that it cannot reach another node on a specific LAN
it
associates a timestamp
with that entry in its table. From that point on the failed node/LAN
combination will not be retried for the number of seconds given by the -t
option unless what?
I assume that it will re-enable the node/LAN combination if it gets a
packet
from it. Is this correct?
What happens if there are 2 LANs: for simplicity lets assume 2 nodes, 2
hubs, default -t (40 seconds) option and both nodes initiate some
communication with each other. What happens if I power down LAN1 hub and
allow each node to see LAN1 has failed, then power it back on @ 10 seconds
and power down LAN2 hub @ 20 seconds. Will Net wait the 20 remaining
seconds
before attempting communication with LAN1?