Unreliable Qnet

We have a lot of problems with Qnet stability/performance. E.g.

  • machines that are restarted are not discovered by other machines
    that are still running. (Until I manually ‘ls’ into /net/machine_x)
  • depending on the topology (switches/hubs) the performance is extremly
    poor. Copying 500 KB takes 10 sec on a 100MBit/s Lan!
  • Connections are lost while accessing a machine, e.g. copying a file
    (Errors: “Host is down” and “Bad file descripter”)

Does anyone have any similar problems? Or can Qnet be replaced by
something else, or at least Qnet’s ndp? Would HA help us with this in
any way?

We use QNX 6.2.1B.

Regards, Oliver

Oliver wrote:

We have a lot of problems with Qnet stability/performance. E.g.

  • machines that are restarted are not discovered by other machines
    that are still running. (Until I manually ‘ls’ into /net/machine_x)

I have also noticed this; however, it doesn’t seem to be a serious
problem since opening the specific remote resource does, in fact,
work.

  • depending on the topology (switches/hubs) the performance is extremly
    poor. Copying 500 KB takes 10 sec on a 100MBit/s Lan!

Perhaps if you could be more specific about what topologies cause the
problem, something might become apparent ? As an additional data
point I get around 500-600KB/s on a 10Mbit LAN.

  • Connections are lost while accessing a machine, e.g. copying a file
    (Errors: “Host is down” and “Bad file descripter”)

Yes. I have seen this also. Although it is very infrequent. It is
definately a potentially serious problem though…

Does anyone have any similar problems? Or can Qnet be replaced by
something else, or at least Qnet’s ndp?

Are you running Qnet over ether or IP ?

Would HA help us with this in any way?

Might help mask the lost connections issue.

We use QNX 6.2.1B.

Same here.

Hi,
we use qnet bind to ether on a network with several ppc405 in an embedde
application.

“Oliver” <redir@dreer.ch> schrieb im Newsbeitrag
news:redir-EFE731.15300330032004@inn.qnx.com

We have a lot of problems with Qnet stability/performance. E.g.

  • machines that are restarted are not discovered by other machines
    that are still running. (Until I manually ‘ls’ into /net/machine_x)

same thing if the node is not touched for a longer time.

  • depending on the topology (switches/hubs) the performance is extremly
    poor. Copying 500 KB takes 10 sec on a 100MBit/s Lan!

we got about 2MByte/sec with filetransfer to another node via a switch on a
100MBit embedded Lan. But that is not enough with a theoretical data rate of
~8MByte/sec.

  • Connections are lost while accessing a machine, e.g. copying a file
    (Errors: “Host is down” and “Bad file descripter”)

similar but low frequency behaviour if large amounts of data are
transferred, also we got following errors in the slogger, which we could not
interpretate:
“Jan 02 00:50:32 7 15 0 npm-qnet(l3_uip): Bad packet length at
l3_uip.c:276 (Invalid argument)”

Does anyone have any similar problems? Or can Qnet be replaced by
something else, or at least Qnet’s ndp? Would HA help us with this in
any way?
may be with nfs or cifs for access to filesystems, but not to access other

resources of remote nodes.

We use QNX 6.2.1B.
we also.

Regards, Oliver
Regards, Werner

Werner Benz <werner.benz@ndt-ag.de> wrote in message
news:c4caul$mub$1@inn.qnx.com

Hi,
we use qnet bind to ether on a network with several ppc405 in an embedde
application.

“Oliver” <> redir@dreer.ch> > schrieb im Newsbeitrag
news:> redir-EFE731.15300330032004@inn.qnx.com> …
We have a lot of problems with Qnet stability/performance. E.g.

  • machines that are restarted are not discovered by other machines
    that are still running. (Until I manually ‘ls’ into /net/machine_x)

same thing if the node is not touched for a longer time.

It is designed working that way. You could try run qnet like
“io-net … -p qnet broadcast=0x0003001e”, to enforce a periodic
broadcasting.

  • depending on the topology (switches/hubs) the performance is extremly
    poor. Copying 500 KB takes 10 sec on a 100MBit/s Lan!

we got about 2MByte/sec with filetransfer to another node via a switch on
a
100MBit embedded Lan. But that is not enough with a theoretical data rate
of
~8MByte/sec.

  • Connections are lost while accessing a machine, e.g. copying a file
    (Errors: “Host is down” and “Bad file descripter”)

similar but low frequency behaviour if large amounts of data are
transferred, also we got following errors in the slogger, which we could
not
interpretate:
“Jan 02 00:50:32 7 15 0 npm-qnet(l3_uip): Bad packet length at
l3_uip.c:276 (Invalid argument)”

What ether driver is this, devn-pcnet.so ? (pidin -p io-net mem to confirm)
The message above claims the packet length reported by the NIC driver
is not agree with the length the sender QNET embedded into the packet.
It’s more of a warning message, but the packet is dropped. More likely
the NIC driver or something on the path damage the packet.

-xtang

In article <4069A148.6010406@csical.com>,
Rennie Allen <rallen@csical.com> wrote:

  • depending on the topology (switches/hubs) the performance is extremly
    poor. Copying 500 KB takes 10 sec on a 100MBit/s Lan!

Perhaps if you could be more specific about what topologies cause the
problem, something might become apparent ? As an additional data
point I get around 500-600KB/s on a 10Mbit LAN.

Some setups I’ve tried:
Qnx1 <-> 3com 100MBit Switch <-> Qnx2 : Very fast
Qnx1 <-> Netgear 100MBit Switch <-> Farallon 100MBit Switch <-> Qnx2 :
Medium fast
Qnx1 <-> Netgear 10/100MBit Dualspeed Hub <-> Qnx2 : Very slow

I guess a switched network is a must for Qnet!

Are you running Qnet over ether or IP ?

We use it over ether, but the IP stack is also running on the same
interface.

“Xiaodan Tang” <xtang@qnx.com> schrieb im Newsbeitrag
news:c4d7hh$g7k$1@inn.qnx.com

Werner Benz <> werner.benz@ndt-ag.de> > wrote in message
news:c4caul$mub$> 1@inn.qnx.com> …
Hi,
we use qnet bind to ether on a network with several ppc405 in an embedde
application.

“Oliver” <> redir@dreer.ch> > schrieb im Newsbeitrag
news:> redir-EFE731.15300330032004@inn.qnx.com> …
We have a lot of problems with Qnet stability/performance. E.g.

  • machines that are restarted are not discovered by other machines
    that are still running. (Until I manually ‘ls’ into /net/machine_x)

same thing if the node is not touched for a longer time.

It is designed working that way. You could try run qnet like
“io-net … -p qnet broadcast=0x0003001e”, to enforce a periodic
broadcasting.
I will try this, thank you.

  • depending on the topology (switches/hubs) the performance is
    extremly
    poor. Copying 500 KB takes 10 sec on a 100MBit/s Lan!

we got about 2MByte/sec with filetransfer to another node via a switch
on
a
100MBit embedded Lan. But that is not enough with a theoretical data
rate
of
~8MByte/sec.

  • Connections are lost while accessing a machine, e.g. copying a file
    (Errors: “Host is down” and “Bad file descripter”)

similar but low frequency behaviour if large amounts of data are
transferred, also we got following errors in the slogger, which we could
not
interpretate:
“Jan 02 00:50:32 7 15 0 npm-qnet(l3_uip): Bad packet length at
l3_uip.c:276 (Invalid argument)”

What ether driver is this, devn-pcnet.so ? (pidin -p io-net mem to
confirm)
The message above claims the packet length reported by the NIC driver
is not agree with the length the sender QNET embedded into the packet.
It’s more of a warning message, but the packet is dropped. More likely
the NIC driver or something on the path damage the packet.

it is ‘devn-ppc405.so’
Werner

-xtang

Rennie Allen <rallen@csical.com> wrote:
RA > Oliver wrote:

  • Connections are lost while accessing a machine, e.g. copying a file
    (Errors: “Host is down” and “Bad file descripter”)

RA > Yes. I have seen this also. Although it is very infrequent. It is
RA > definately a potentially serious problem though…

I used to get this bad. I was sent a new devn-el900.so driver and those
problems went away.

What driver are you useing?
Maybe they have a fixed version of that driver.

Oliver wrote:

In article <> 4069A148.6010406@csical.com> >,
Rennie Allen <> rallen@csical.com> > wrote:


Some setups I’ve tried:
Qnx1 <-> 3com 100MBit Switch <-> Qnx2 : Very fast
Qnx1 <-> Netgear 100MBit Switch <-> Farallon 100MBit Switch <-> Qnx2 :
Medium fast
Qnx1 <-> Netgear 10/100MBit Dualspeed Hub <-> Qnx2 : Very slow

OK, so it doesn’t seem to be a Qnet problem per-se, since configuration #1
is fast, Qnet is obviously capable of being fast.

In config #3 is one of the QNX nodes 10 and the other 100 ?

Try removing all netgear hardware out of the equation (interestingly my
machine is connected via a 3com hub). It could be simply that the
netgear hardware drops a higher percentage of packets, and this hits
Qnet hard, whereas TCP is more tolerant of this (something that I would
expect, actually).

Rennie

In article <406AFCA2.2050906@csical.com>,
Rennie Allen <rallen@csical.com> wrote:

Some setups I’ve tried:
Qnx1 <-> 3com 100MBit Switch <-> Qnx2 : Very fast
Qnx1 <-> Netgear 100MBit Switch <-> Farallon 100MBit Switch <-> Qnx2 :
Medium fast
Qnx1 <-> Netgear 10/100MBit Dualspeed Hub <-> Qnx2 : Very slow


In config #3 is one of the QNX nodes 10 and the other 100 ?

Both are 100MBit/s. I just replaced the Farallon switch in config #2
with another Netgear switch, and the performance improved significantly.

Also, I have different NICs resulting in different ethernet drivers.
(I have devn-speed, -tulip and -rtl drivers)

I guess good Qnets performance doesn depend on the hardware involved.
I’ll try to use a network sniffer to find out more about traffic,
packet loss etc…

Oliver

“Oliver” <redir@dreer.ch> wrote in message
news:redir-374E81.16325201042004@inn.qnx.com

In article <> 406AFCA2.2050906@csical.com> >,
Rennie Allen <> rallen@csical.com> > wrote:

Some setups I’ve tried:
Qnx1 <-> 3com 100MBit Switch <-> Qnx2 : Very fast
Qnx1 <-> Netgear 100MBit Switch <-> Farallon 100MBit Switch <-> Qnx2 :
Medium fast
Qnx1 <-> Netgear 10/100MBit Dualspeed Hub <-> Qnx2 : Very slow


In config #3 is one of the QNX nodes 10 and the other 100 ?

Both are 100MBit/s. I just replaced the Farallon switch in config #2
with another Netgear switch, and the performance improved significantly.

Also, I have different NICs resulting in different ethernet drivers.
(I have devn-speed, -tulip and -rtl drivers)

I guess good Qnets performance doesn depend on the hardware involved.
I’ll try to use a network sniffer to find out more about traffic,
packet loss etc…

Maybe it has something to do with auto negotiation speed.

Oliver

Oliver <redir@dreer.ch> wrote in message
news:redir-374E81.16325201042004@inn.qnx.com

I’ll try to use a network sniffer to find out more about traffic,
packet loss etc…

You can “cat /proc/qnetstats” to get some qnet statics. The most instested
one in
this case would be the L4 “retransmited/slowstart/fastrecover/crc” numbers.
If these numbers keeps on increasing, you probably want to look at stuffs
below
QNET (driver/hardware…), and you performance won’t go anywhere better.

I just improved network performance significatly by forcing full duplex on
the NIC. It wasn’t a QNX installtion mind you, but you should be able to do
the same under QNX Sometimes network cards have problems with auto
negotiation. I just heard from one guy that devices connected to CISCO
devices have problems with auto negotiation and that you have to force full
duplex to get any decent throughput. If CISCO has problems, I’m sure other
devices show similar behavior.

Kevin

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:c4hhe9$8ll$1@inn.qnx.com

“Oliver” <> redir@dreer.ch> > wrote in message
news:> redir-374E81.16325201042004@inn.qnx.com> …
In article <> 406AFCA2.2050906@csical.com> >,
Rennie Allen <> rallen@csical.com> > wrote:

Some setups I’ve tried:
Qnx1 <-> 3com 100MBit Switch <-> Qnx2 : Very fast
Qnx1 <-> Netgear 100MBit Switch <-> Farallon 100MBit Switch <-> Qnx2
:
Medium fast
Qnx1 <-> Netgear 10/100MBit Dualspeed Hub <-> Qnx2 : Very slow


In config #3 is one of the QNX nodes 10 and the other 100 ?

Both are 100MBit/s. I just replaced the Farallon switch in config #2
with another Netgear switch, and the performance improved significantly.

Also, I have different NICs resulting in different ethernet drivers.
(I have devn-speed, -tulip and -rtl drivers)

I guess good Qnets performance doesn depend on the hardware involved.
I’ll try to use a network sniffer to find out more about traffic,
packet loss etc…

Maybe it has something to do with auto negotiation speed.


Oliver