Socket, Server unreachable

Hello

At our companies customer site, there are a server (OS/2) system and 2
client (QNX4.25, TCP/IP 4.25) systems. The connection is done by RPC
communication. Usually first data is copied by the clients from the server
through nfs. Later on there is a RPC call ‘StartJob’ from server to the
clients. Now either not all data is copied, or not every copy command is
confirmed (also by rpc), or the ‘StartJob’ call fails. That is the problem,
because the only remedy for the customer is to reboot the QNX-Systems.
Now I have a few questions:

a) We installed a net sniffer which recorded all network traffic between all
systems. There I discovered that the QNX-Systems replied to the ‘StartJob’
call with the RST flag of TCP protocol set. On one system the consequent
getport call is successfull on the other it isn’t. What causes abnormal
termination of the connection? Does this give us a hint to where the problem
is?

I tried tcprt and tcptk 5.0. There I discovered the following: Compiling
with the same compiler options (-ms -3) led to the following error message:
svc_run() exit, not enough memory. Comparing with the debugger I saw that
select() does a check in TCP 5.0 which isn’t done in 4.25.

b) What does select() 5.0 check?

c) Can the compiler options, especially -ms (set in CFLAGS) cause the above
mentioned problem.

d) What else has to be considered when I want to use TCP 5.0? What
additional capabilities are there to pin point my problem?

Thanks to every one for any help.

Best regards

Markus Jauslin

“Markus Jauslin” <markus.jauslin@ch.mullermartini.com> wrote in message
news:ahlhoe$o1l$1@inn.qnx.com

Hello

At our companies customer site, there are a server (OS/2) system and 2
client (QNX4.25, TCP/IP 4.25) systems. The connection is done by RPC
communication. Usually first data is copied by the clients from the server
through nfs. Later on there is a RPC call ‘StartJob’ from server to the
clients. Now either not all data is copied, or not every copy command is
confirmed (also by rpc), or the ‘StartJob’ call fails. That is the
problem,
because the only remedy for the customer is to reboot the QNX-Systems.
Now I have a few questions:

a) We installed a net sniffer which recorded all network traffic between
all
systems. There I discovered that the QNX-Systems replied to the ‘StartJob’
call with the RST flag of TCP protocol set. On one system the consequent
getport call is successfull on the other it isn’t. What causes abnormal
termination of the connection? Does this give us a hint to where the
problem
is?

I tried tcprt and tcptk 5.0. There I discovered the following: Compiling
with the same compiler options (-ms -3) led to the following error
message:
svc_run() exit, not enough memory. Comparing with the debugger I saw that
select() does a check in TCP 5.0 which isn’t done in 4.25.

b) What does select() 5.0 check?

c) Can the compiler options, especially -ms (set in CFLAGS) cause the
above
mentioned problem.

d) What else has to be considered when I want to use TCP 5.0? What
additional capabilities are there to pin point my problem?

I should note that linking together OS/2 and QNX 4.25E + Tcpip v5b we had a
significant slowdown in tcp speed. Exactly, OS/2 got replyed packets in
~200ms instead of 1…2ms. At application level all was correct. When i
downgraded QNX4 stack to TCPIP 4.24 all started to work fine. This is a some
odd specific problem in communication with OS/2 and Tcpip5 stacks.

Best regards

Markus Jauslin


\

Ian Zagorskih
Novosibirsk Branch of TyumenPromGeophysics
R&D Department
Phone: +7-3832-334210, 331718

Hallo

Thanks for your hint. I did check the response times in both directions
(OS/2 <-> QNX 4.25+Tcp/Ip 5.0) but I couldn’t see too much of a delay,
everything in the range of some milliseconds.

What I have a problem with is with mount_nfs. It will always respond with
‘no such process’. What is the cause?

Thanks for any help.
Markus

“Ian Zagorskih” <NOSPAM-ianzag@mail.ru> schrieb im Newsbeitrag
news:ahlqdp$1v4$1@inn.qnx.com

“Markus Jauslin” <> markus.jauslin@ch.mullermartini.com> > wrote in message
news:ahlhoe$o1l$> 1@inn.qnx.com> …
Hello

At our companies customer site, there are a server (OS/2) system and 2
client (QNX4.25, TCP/IP 4.25) systems. The connection is done by RPC
communication. Usually first data is copied by the clients from the
server
through nfs. Later on there is a RPC call ‘StartJob’ from server to the
clients. Now either not all data is copied, or not every copy command is
confirmed (also by rpc), or the ‘StartJob’ call fails. That is the
problem,
because the only remedy for the customer is to reboot the QNX-Systems.
Now I have a few questions:

a) We installed a net sniffer which recorded all network traffic between
all
systems. There I discovered that the QNX-Systems replied to the
‘StartJob’
call with the RST flag of TCP protocol set. On one system the consequent
getport call is successfull on the other it isn’t. What causes abnormal
termination of the connection? Does this give us a hint to where the
problem
is?

I tried tcprt and tcptk 5.0. There I discovered the following: Compiling
with the same compiler options (-ms -3) led to the following error
message:
svc_run() exit, not enough memory. Comparing with the debugger I saw
that
select() does a check in TCP 5.0 which isn’t done in 4.25.

b) What does select() 5.0 check?

c) Can the compiler options, especially -ms (set in CFLAGS) cause the
above
mentioned problem.

d) What else has to be considered when I want to use TCP 5.0? What
additional capabilities are there to pin point my problem?


I should note that linking together OS/2 and QNX 4.25E + Tcpip v5b we had
a
significant slowdown in tcp speed. Exactly, OS/2 got replyed packets in
~200ms instead of 1…2ms. At application level all was correct. When i
downgraded QNX4 stack to TCPIP 4.24 all started to work fine. This is a
some
odd specific problem in communication with OS/2 and Tcpip5 stacks.


Best regards

Markus Jauslin


\

Ian Zagorskih
Novosibirsk Branch of TyumenPromGeophysics
R&D Department
Phone: +7-3832-334210, 331718

I found the problem to the mount_nfs problem: I forgot to start NFSfsys.

But if the directory has too many entries the ls command hangs for a while
and returns with
the error message: “ls: readdir of ‘Lan’ failed (Host is down)”

Looking at the packages which NFSfsys sends and receives to the remote host
shows:
V2 GETATTR Call
V2 GETATTR Reply
V2 STATFS Call
V2 STATFS Reply
V2 GETATTR Call
V2 GETATTR Reply
then a number of malformed frames

Starting NFSfsys -B 1000 delivers only 41 and NFSfsys -B 2000 gives 82
directory entries.

Looks like a bug in NFSfsys.

Markus

“Markus Jauslin” <markus.jauslin@ch.mullermartini.com> schrieb im
Newsbeitrag news:ai90c7$87i$1@inn.qnx.com

Hallo

Thanks for your hint. I did check the response times in both directions
(OS/2 <-> QNX 4.25+Tcp/Ip 5.0) but I couldn’t see too much of a delay,
everything in the range of some milliseconds.

What I have a problem with is with mount_nfs. It will always respond with
‘no such process’. What is the cause?

Thanks for any help.
Markus

“Ian Zagorskih” <> NOSPAM-ianzag@mail.ru> > schrieb im Newsbeitrag
news:ahlqdp$1v4$> 1@inn.qnx.com> …

“Markus Jauslin” <> markus.jauslin@ch.mullermartini.com> > wrote in message
news:ahlhoe$o1l$> 1@inn.qnx.com> …
Hello

At our companies customer site, there are a server (OS/2) system and 2
client (QNX4.25, TCP/IP 4.25) systems. The connection is done by RPC
communication. Usually first data is copied by the clients from the
server
through nfs. Later on there is a RPC call ‘StartJob’ from server to
the
clients. Now either not all data is copied, or not every copy command
is
confirmed (also by rpc), or the ‘StartJob’ call fails. That is the
problem,
because the only remedy for the customer is to reboot the QNX-Systems.
Now I have a few questions:

a) We installed a net sniffer which recorded all network traffic
between
all
systems. There I discovered that the QNX-Systems replied to the
‘StartJob’
call with the RST flag of TCP protocol set. On one system the
consequent
getport call is successfull on the other it isn’t. What causes
abnormal
termination of the connection? Does this give us a hint to where the
problem
is?

I tried tcprt and tcptk 5.0. There I discovered the following:
Compiling
with the same compiler options (-ms -3) led to the following error
message:
svc_run() exit, not enough memory. Comparing with the debugger I saw
that
select() does a check in TCP 5.0 which isn’t done in 4.25.

b) What does select() 5.0 check?

c) Can the compiler options, especially -ms (set in CFLAGS) cause the
above
mentioned problem.

d) What else has to be considered when I want to use TCP 5.0? What
additional capabilities are there to pin point my problem?


I should note that linking together OS/2 and QNX 4.25E + Tcpip v5b we
had
a
significant slowdown in tcp speed. Exactly, OS/2 got replyed packets in
~200ms instead of 1…2ms. At application level all was correct. When i
downgraded QNX4 stack to TCPIP 4.24 all started to work fine. This is a
some
odd specific problem in communication with OS/2 and Tcpip5 stacks.


Best regards

Markus Jauslin


\

Ian Zagorskih
Novosibirsk Branch of TyumenPromGeophysics
R&D Department
Phone: +7-3832-334210, 331718

\