Redundant networking failure

Hi,

I’m using redundant networking under QNX 4.24, and I’ve 2 problems:

If one network fails the system continues working OK, but
periodically they seems to get frozen for a few seconds. It seems
as if the system was trying to reconnect the damaged network at the
highest priority. Is there a way to avoid those delays?

If one network fails the system continues working. This is perfect.
But if there is not a warning the system is working without
network redundancy and nobody is aware of that fact. Only a half of
the problem is solved. How can I know the actual
status of each network programmatically?

Thanks in advance for your answers.

It sounds as though you have a problem that was fixed some time ago. You
don’t say what network driver(s) you are running. The problem is that the
network that is down is trying to re-negotiate the link intermittently and
that is when you see the delays. Unfortunately there is no way to check for
this programatically.

Please post the output from ‘sin ver’.

“Joan Baucells” <“Joan Baucells”@NoSpam.es> wrote in message
news:40A468A3.207A960C@NoSpam.es

Hi,

I’m using redundant networking under QNX 4.24, and I’ve 2 problems:

If one network fails the system continues working OK, but
periodically they seems to get frozen for a few seconds. It seems
as if the system was trying to reconnect the damaged network at the
highest priority. Is there a way to avoid those delays?

If one network fails the system continues working. This is perfect.
But if there is not a warning the system is working without
network redundancy and nobody is aware of that fact. Only a half of
the problem is solved. How can I know the actual
status of each network programmatically?

Thanks in advance for your answers.

Hugh Brown <hsbrown@qnx.com> wrote:
HB > It sounds as though you have a problem that was fixed some time ago. You
HB > don’t say what network driver(s) you are running. The problem is that the
HB > network that is down is trying to re-negotiate the link intermittently and
HB > that is when you see the delays. Unfortunately there is no way to check for
HB > this programatically.

HB > Please post the output from ‘sin ver’.


There’s another thing to check, wether or not the above statement applies:
when I was using redundant networks under QNX4 I found that I needed to
significantly reduce the TimeOut and the Number of Retries on the network
drivers involved. I believe one of them was -N, I’m not sure what the other
option letter was.

What this does is this, when both networks were healthy, and one fails, it
forces Net to give up much sooner on the first network and try the second
network. Don’t set the values too low or Net will give up on both networks
in inctances where either one might have worked. This results in Net
reporting false errors. It takes some fine tuning.

Using this technique I was able to play streaming audio through the network
and, while it was playing, first disconnect one network cable, then plug it
back in and unplug the other network cable. COOL TRICK QNX !

Also, if you have a lot of network traffic, and two networks and one system
acting as a file server, here’s a trick that can help a lot. We all know
that the enemy of ethernet is collisions. To reduce collisions, force all
of your network on one network in one direction (i.e. toward the file
server) and on the other network force all of the traffic in the other
direction (i.e. away from the file server). Thereby, there are very few
colisions. You should have no collisions coming away from the file server.
The colisions coming toward the file server will only be from the
workstations. Usually workstations transmit a lot less data that the file
server.

To do this, play with the media rates on all of the drivers. I.E. lie to Net.
If Net sees a difference of 10 : 1 in the media rate on one driver over
another it will always choose the faster driver unless that driver is in a
failed state. Then it will try the other driver. If the ratio is less
then 10 : 1, Net looks at them as “close enough” to be considered the
same, one of the goofiest things aboyd ever did.

Bill Caroselli <qtps@earthlink.net> wrote:

To do this, play with the media rates on all of the drivers. I.E. lie to Net.
If Net sees a difference of 10 : 1 in the media rate on one driver over
another it will always choose the faster driver unless that driver is in a
failed state. Then it will try the other driver. If the ratio is less
then 10 : 1, Net looks at them as “close enough” to be considered the
same, one of the goofiest things aboyd ever did.

It is actually “greater than 3 binary orders of magnitude”. That is if
the difference is greater than 8x, it will always choose the faster.
(Probably cause a left/right shift of 3 is a lot cheaper than a
multiply by 10.)

-David

Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

Well, until now no real network failure has occurred.

We have some QNX4 installations in one of our best customers workshop. These
installations have been working fine for a lot of years. As the installations
are critical we have duplicated the network for preventing possible failures.
Both problems have appeared in demonstrating the system goodness to the
maintenance boys. Until now they have not become “real problems”, but the
possibility exists. The demonstration have consisted simply in alternately
disconnect both network cables.

During the demonstration all of us have observed some periodic delays when the
system is working with only one network cable. Those delays are annoying, but
no fatal ones.

The possibility to get an alarm by program has been a petition from maintenance
people. Your answer indicates that IT’S NOT POSSIBLE to obtain this
information. This is really a shame, because without this detail redundancy
network looses sense in an industrial environment.

FYI, we are working with two different Ethernet drivers, depending on the
hardware:

Net.ether1000
Net.rtl


Thank you for your answer.


Hugh Brown wrote:

It sounds as though you have a problem that was fixed some time ago. You
don’t say what network driver(s) you are running. The problem is that the
network that is down is trying to re-negotiate the link intermittently and
that is when you see the delays. Unfortunately there is no way to check for
this programatically.

Please post the output from ‘sin ver’.

“Joan Baucells” <“Joan Baucells”@NoSpam.es> wrote in message
news:> 40A468A3.207A960C@NoSpam.es> …
Hi,

I’m using redundant networking under QNX 4.24, and I’ve 2 problems:

If one network fails the system continues working OK, but
periodically they seems to get frozen for a few seconds. It seems
as if the system was trying to reconnect the damaged network at the
highest priority. Is there a way to avoid those delays?

If one network fails the system continues working. This is perfect.
But if there is not a warning the system is working without
network redundancy and nobody is aware of that fact. Only a half of
the problem is solved. How can I know the actual
status of each network programmatically?

Thanks in advance for your answers.

Joan Baucells <“Joan Baucells”@nospam.es> wrote:
JB > Well, until now no real network failure has occurred.

JB > We have some QNX4 installations in one of our best customers workshop. These
JB > installations have been working fine for a lot of years. As the installations
JB > are critical we have duplicated the network for preventing possible failures.
JB > Both problems have appeared in demonstrating the system goodness to the
JB > maintenance boys. Until now they have not become real problems, but the
JB > possibility exists. The demonstration have consisted simply in alternately
JB > disconnect both network cables.

JB > During the demonstration all of us have observed some periodic delays when the
JB > system is working with only one network cable. Those delays are annoying, but
JB > no fatal ones.

JB > The possibility to get an alarm by program has been a petition from maintenance
JB > people. Your answer indicates that IT’S NOT POSSIBLE to obtain this
JB > information. This is really a shame, because without this detail redundancy
JB > network looses sense in an industrial environment.

JB > FYI, we are working with two different Ethernet drivers, depending on the
JB > hardware:

JB > Net.ether1000
JB > Net.rtl


OK. As I said, it’s been a while since I’ve worked with QNX4. But try this:

Next time one of the networks are down type the command ‘netmap’. I seem
to recall that it will display some error indicator. If it does, then you
just need to popen() that command and parse the output.

I’m sure that there are software calls to get that information directly,
but I don’t know if they are documented.