Disconnecting/Reconnecting Ethernet Problems

Hello Group,

I have written two socket applications (client/server) that when I
disconnect and reconnect the ethernet cable causes problems with the
server. The client only ever connects and sends messages to the
server at rate of about 1 second. The server just listens, forks and
waits for messages. There is no higher level communication protocol
so the client just writes and the server just reads.

The test scenario that I have is that I have two clients connected to
the server (child processes) from seperate computers. When I
disconnect the ethernet, the client continues to write messages
sucessfully to the socket without knowledge of the disconnect (I’m
okay with that). The server obviously stops receiving messages. The
unexpected behavior arises from when I reconnect the ethernet cable.
One of the server’s child process will SIGSEGV and the other will
return from a blocking readn call having reported that it read the
expect bytes from the socket. This problem is repeatable and happens
everytime. The SIGSEGV address seems to be bogus but is sometimes
different (but bogus in the same way), i.e. 0007:3F800000. I’ve also
tried running a debugger on the child processes in hopes of catching
where the segmentation fault happens but it doesn’t give me any
useful information.

This problem happens under QNX 4.25 using TCP/IP 5.0 with the latest
patches. The same code is also used and compiled under
FreeBSD/OpenBSD with a completely different behavior. The behavior I
see under BSD is that, the client will continue to send messages and
once the ethernet connection is restored the client will get an EPIPE
on the write. The server child process will remain until the readn
fails with -1 with errno set to ECONNRESET after its time wait period
(I have SO_KEEPALIVE set).

What is the expected behavior on either end of a socket connection
when the network connection is lost under QNX4? Why would the read()
in my readn() function not return a -1 like under BSD and in the one
case my readn returned expected number of bytes? And why does one
child process seg fault and the other not?

Any suggestions or insight is appreciated.