Dear QNXers,
we have a robot which is controlled by 3 Pentium / AMD PCs. All PC’s
are connected with CormanTech CT 120 FE 100 bit network cards on
logical net 2. Node 2 and 3 are diskless and booted over this
network. Now, if our software system is running, it sometimes hangs for
a few seconds and it sometimes hangs forever. The error is not caused
by name server polling. Some Debugging using a ram disk on node 2 has
shown that the network connection to the second node seems to fail
sometimes. The console of node 2 shows the message
Proc: lost reply across net, freeing reply_blk local pid 0959
This message is printed repeatedly. In order to increase the number of
retries, the network driver is started
Net.ct100tx -l 2 -T 100 -n 20 -N 20 &
for all nodes. But appearently the increased number of retries does
not solve the problem.
Does anybody have an idea about the origin of this problem? And does
it make sense to increase the number of retries even further?
Thanks for any comments and best regards,
-Rainer
–
*** ____ ****** . * . ******* Dipl.-Ing. Rainer Menzner ********************
( / \ /| /| Ruhr-Universitaet Bochum
/ | / | / | Institut fuer Neuroinformatik
/____/ / |/ | __ D-44780 Bochum, Germany
/ \ / ’ | ( / ----------------------------------------------
(/ _ o (/ | -/- o
********************* /–) ** Tel. +49-234/32-27978 ************************
eMail: Rainer.Menzner@neuroinformatik.ruhr-uni-bochum.de
WWW: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/PEOPLE/rmz/top.html
Hello Rainer,
Have you looked at the output of “netinfo -l” and “netinfo” to see if there
are any obvious problems there? This stuff is rather cryptic, so if you’d
like some help interpreting the information, you might want to post the
output of these commands here (maybe from all 3 nodes). Also post the
output of “sin ver” from one of your systems.
In the absence of this information, I can only guess at the cause of the
trouble. What are you using to connect the 3 network cards? A hub or a
switch? If a switch, then your problems may be related to mismatched
half/full duplex settings, in which case you likely need an update to our
Net.ct100tx driver.
Regards,
Bert Menkveld
Engineer
Corman Technologies Inc.
bert@cormantech.com
Rainer Menzner <rmz@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote in
message news:rfzomt1eyn.fsf@mailhost.neuroinformatik.ruhr-uni-bochum.de…
Dear QNXers,
we have a robot which is controlled by 3 Pentium / AMD PCs. All PC’s
are connected with CormanTech CT 120 FE 100 bit network cards on
logical net 2. Node 2 and 3 are diskless and booted over this
network. Now, if our software system is running, it sometimes hangs for
a few seconds and it sometimes hangs forever. The error is not caused
by name server polling. Some Debugging using a ram disk on node 2 has
shown that the network connection to the second node seems to fail
sometimes. The console of node 2 shows the message
Proc: lost reply across net, freeing reply_blk local pid 0959
This message is printed repeatedly. In order to increase the number of
retries, the network driver is started
Net.ct100tx -l 2 -T 100 -n 20 -N 20 &
for all nodes. But appearently the increased number of retries does
not solve the problem.
Does anybody have an idea about the origin of this problem? And does
it make sense to increase the number of retries even further?
Thanks for any comments and best regards,
-Rainer
–
*** ____ ****** . * . ******* Dipl.-Ing. Rainer Menzner
( / \ /| /| Ruhr-Universitaet Bochum
/ | / | / | Institut fuer Neuroinformatik
/____/ / |/ | __ D-44780 Bochum, Germany
********************* /–) ** Tel. +49-234/32-27978
eMail: > Rainer.Menzner@neuroinformatik.ruhr-uni-bochum.de
WWW:
http://www.neuroinformatik.ruhr-uni-bochum.de/ini/PEOPLE/rmz/top.html
“Bert Menkveld” <bert@cormantech.com> writes:
Bert,
Hello Rainer,
Have you looked at the output of “netinfo -l” and “netinfo” to see if there
are any obvious problems there? This stuff is rather cryptic, so if you’d
like some help interpreting the information, you might want to post the
output of these commands here (maybe from all 3 nodes). Also post the
output of “sin ver” from one of your systems.
I have looked at the netinfo output but, as you mentioned, this is too
cryptic for me. Using “sin ver” I found that the Net.ct100tx was
rather old (4.23). I have now downloaded the latest driver from the
CormanTech website but, alas, after installing, node 2 and node 3
refused to boot over the network. The boot procedure got stuck after
Starting to load os from node …
but before
Executing os
After hours of debugging, I now believe that the image has grown too
large and this it is only partially sent to the slave nodes. The image
is about 590K, but I have never succeeded to find a limit value in the
docs. However, before being able to provide the information I will
have to make the system properly boot. BTW, we are using a hub so
that no intelligent functionality of a switch could interfere with the
system.
Regards,
-Rainer
In the absence of this information, I can only guess at the cause of the
trouble. What are you using to connect the 3 network cards? A hub or a
switch? If a switch, then your problems may be related to mismatched
half/full duplex settings, in which case you likely need an update to our
Net.ct100tx driver.
–
*** ____ ****** . * . ******* Dipl.-Ing. Rainer Menzner ********************
( / \ /| /| Ruhr-Universitaet Bochum
/ | / | / | Institut fuer Neuroinformatik
/____/ / |/ | __ D-44780 Bochum, Germany
/ \ / ’ | ( / ----------------------------------------------
(/ _ o (/ | -/- o
********************* /–) ** Tel. +49-234/32-27978 ************************
eMail: Rainer.Menzner@neuroinformatik.ruhr-uni-bochum.de
WWW: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/PEOPLE/rmz/top.html
Just for testing give a try to Net.tulip.
“Rainer Menzner” <rmz@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote in
message news:rfwvhtuusw.fsf@mailhost.neuroinformatik.ruhr-uni-bochum.de…
“Bert Menkveld” <> bert@cormantech.com> > writes:
Bert,
Hello Rainer,
Have you looked at the output of “netinfo -l” and “netinfo” to see if
there
are any obvious problems there? This stuff is rather cryptic, so if
you’d
like some help interpreting the information, you might want to post the
output of these commands here (maybe from all 3 nodes). Also post the
output of “sin ver” from one of your systems.
I have looked at the netinfo output but, as you mentioned, this is too
cryptic for me. Using “sin ver” I found that the Net.ct100tx was
rather old (4.23). I have now downloaded the latest driver from the
CormanTech website but, alas, after installing, node 2 and node 3
refused to boot over the network. The boot procedure got stuck after
Starting to load os from node …
but before
Executing os
After hours of debugging, I now believe that the image has grown too
large and this it is only partially sent to the slave nodes. The image
is about 590K, but I have never succeeded to find a limit value in the
docs. However, before being able to provide the information I will
have to make the system properly boot. BTW, we are using a hub so
that no intelligent functionality of a switch could interfere with the
system.
Regards,
-Rainer
In the absence of this information, I can only guess at the cause of the
trouble. What are you using to connect the 3 network cards? A hub or a
switch? If a switch, then your problems may be related to mismatched
half/full duplex settings, in which case you likely need an update to
our
Net.ct100tx driver.
\
*** ____ ****** . * . ******* Dipl.-Ing. Rainer Menzner
( / \ /| /| Ruhr-Universitaet Bochum
/ | / | / | Institut fuer Neuroinformatik
/____/ / |/ | __ D-44780 Bochum, Germany
/ \ / ’ | (
(/ _ o (/ | -/- o
********************* /–) ** Tel. +49-234/32-27978
eMail: > Rainer.Menzner@neuroinformatik.ruhr-uni-bochum.de
WWW:
http://www.neuroinformatik.ruhr-uni-bochum.de/ini/PEOPLE/rmz/top.html
Hello again, Rainer,
The latest Net.ct100tx is significantly larger than the older version you
were using before, so your belief that your boot image has gotten too large
may well be correct. Can you try leaving something else out of the boot
image to see if that makes things work again? You would need to eliminate
about 30K to even things out compared to the older Net.ct100tx.
Meanwhile, you might still want to post the “netinfo” and “netinfo -l”
output here – maybe there will be some useful clues to help us understand
what is happening.
Regards,
Bert Menkveld
Engineer
Corman Technologies Inc.
bert@cormantech.com
Rainer Menzner <rmz@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote in
message news:rfwvhtuusw.fsf@mailhost.neuroinformatik.ruhr-uni-bochum.de…
“Bert Menkveld” <> bert@cormantech.com> > writes:
Bert,
Hello Rainer,
Have you looked at the output of “netinfo -l” and “netinfo” to see if
there
are any obvious problems there? This stuff is rather cryptic, so if
you’d
like some help interpreting the information, you might want to post the
output of these commands here (maybe from all 3 nodes). Also post the
output of “sin ver” from one of your systems.
I have looked at the netinfo output but, as you mentioned, this is too
cryptic for me. Using “sin ver” I found that the Net.ct100tx was
rather old (4.23). I have now downloaded the latest driver from the
CormanTech website but, alas, after installing, node 2 and node 3
refused to boot over the network. The boot procedure got stuck after
Starting to load os from node …
but before
Executing os
After hours of debugging, I now believe that the image has grown too
large and this it is only partially sent to the slave nodes. The image
is about 590K, but I have never succeeded to find a limit value in the
docs. However, before being able to provide the information I will
have to make the system properly boot. BTW, we are using a hub so
that no intelligent functionality of a switch could interfere with the
system.
Regards,
-Rainer
In the absence of this information, I can only guess at the cause of the
trouble. What are you using to connect the 3 network cards? A hub or a
switch? If a switch, then your problems may be related to mismatched
half/full duplex settings, in which case you likely need an update to
our
Net.ct100tx driver.
\
*** ____ ****** . * . ******* Dipl.-Ing. Rainer Menzner
( / \ /| /| Ruhr-Universitaet Bochum
/ | / | / | Institut fuer Neuroinformatik
/____/ / |/ | __ D-44780 Bochum, Germany
********************* /–) ** Tel. +49-234/32-27978
eMail: > Rainer.Menzner@neuroinformatik.ruhr-uni-bochum.de
WWW:
http://www.neuroinformatik.ruhr-uni-bochum.de/ini/PEOPLE/rmz/top.html