Stack unresponsive after some time

QNX 6.2.1B on realtek 2139 h/w, using
a IEI JUKI h/w.

The QNX box connects to a server very couple seconds,
writes a message and then disconnects.
Sequence is:
sock = socket(AF_INET, SOCK_STREAM, 0);
connect(sock, (struct sockaddr *)&server, sizeof(server))
write(sock, msg_buffer, msg_size);
read(sock, ack_buffer, sizeof(dcs_avc_ack_msg_t));
close(sock);

(With error checking omitted)

For some reason, at seeminlgy a random time, this
sequence fails. I’ve not been able to get to the
box to get detail (what should one look for),
but is there anything wrong with the above sequence?
Resetting the QNX box cures things, so the problem
is on the QNX side.

Rest of the application runs ok.
A thread is dedicated to the above sequence.
Thread continues to run ok.


Using Opera’s revolutionary e-mail client: http://www.opera.com/m2/

There’s nothing wrong with the sequence.

Other than the obvious return value checking I’d start
by ‘ping -n <server_ip_addr>’ vs ‘ping -n 127.0.0.1’ and
looking at a sniff on the wire when trying to talk to
the server.

Regards,

-seanb

Alex/Systems 104 <acellarius@yah0o.lsd.com> wrote:

QNX 6.2.1B on realtek 2139 h/w, using
a IEI JUKI h/w.

The QNX box connects to a server very couple seconds,
writes a message and then disconnects.
Sequence is:
sock = socket(AF_INET, SOCK_STREAM, 0);
connect(sock, (struct sockaddr *)&server, sizeof(server))
write(sock, msg_buffer, msg_size);
read(sock, ack_buffer, sizeof(dcs_avc_ack_msg_t));
close(sock);

(With error checking omitted)

For some reason, at seeminlgy a random time, this
sequence fails. I’ve not been able to get to the
box to get detail (what should one look for),
but is there anything wrong with the above sequence?
Resetting the QNX box cures things, so the problem
is on the QNX side.

Rest of the application runs ok.
A thread is dedicated to the above sequence.
Thread continues to run ok.


Using Opera’s revolutionary e-mail client: > http://www.opera.com/m2/

On Tue, 12 Jul 2005 15:06:02 +0200, Sean Boudreau <seanb@qnx.com> wrote:

There’s nothing wrong with the sequence.

Other than the obvious return value checking I’d start
by ‘ping -n <server_ip_addr>’ vs ‘ping -n 127.0.0.1’ and
looking at a sniff on the wire when trying to talk to
the server.

It appears not to be a problem with the stack or driver, as
ftp & telnet access into the box still works.

Have enabled detailed displaying of the
status of the socket/open/write/read/close calls,
on a lab setup.
Will post what errors are picked up when it occurs.

Alex/Systems 104 <acellarius@yah0o.lsd.com> wrote:

ping, ftp, & telnet still working.

I’d also check the other end. Is it properly/completely cleaning
up all of these sockets? Or, are they still in some non-finished
state?

QNX 6.2.1B on realtek 2139 h/w, using
a IEI JUKI h/w.

The QNX box connects to a server very couple seconds,
writes a message and then disconnects.

Side question:

Is there a reason it connects & disconnects repeatedly? That looks
like a lot of extra overhead to me.

Sequence is:
sock = socket(AF_INET, SOCK_STREAM, 0);
connect(sock, (struct sockaddr *)&server, sizeof(server))
write(sock, msg_buffer, msg_size);
read(sock, ack_buffer, sizeof(dcs_avc_ack_msg_t));
close(sock);

(With error checking omitted)

For some reason, at seeminlgy a random time, this
sequence fails. I’ve not been able to get to the
box to get detail (what should one look for),
but is there anything wrong with the above sequence?
Resetting the QNX box cures things, so the problem
is on the QNX side.

Have you tried resetting the server? Does that also
cure the problem?

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

On Mon, 18 Jul 2005 19:33:09 +0200, David Gibbs <dagibbs@qnx.com> wrote:

I’d also check the other end. Is it properly/completely cleaning
up all of these sockets? Or, are they still in some non-finished
state?

Will ask

Side question:

Is there a reason it connects & disconnects repeatedly? That looks
like a lot of extra overhead to me.

The customer wants it that way, in case the link gets
terminated for whatever reason
(not a reliable link between the two).
Will ask again why it’s done this way.

Thanks for the hints, David