Serial Communication: Reply Messages From Dev32 Being Lost?

In our product we talk to two external devices via serial ports. Our
controlling processes each open their serial device for read/write in
standard blocking mode, and communicate using the standard read(), write(),
open(), close() functions. Here are the flags used in our open call (where
device is the path to the serial port device):

fd = open(device, O_RDWR | O_NOCTTY);

We have, on occaision, seen these processes independently (ie, not
necessarily at the same time) being REPLY-blocked on the Dev32 process
indefinitely while the Dev32 process (and the Dev32.ser process) are
RECEIVE-blocked on 0. This suggests to us that a message has been sent by
our process to the Dev32 device (via the read or write function), and
received by Dev32, but our process did not receive a reply to that message.

Here is the output of sin ver on the node we most recently noticed this event:

PROGRAM NAME VERSION DATE
/boot/sys/Proc32 Proc 4.25L Feb 15 2001
/boot/sys/Proc32 Slib16 4.23G Oct 04 1996
/boot/sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24Y Apr 23 2002
/bin/Fsys.eide eide 4.25G Apr 15 2002
//35/bin/Dev32 Dev32 4.23G Oct 04 1996
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//35/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//35/bin/Pipe Pipe 4.23A Feb 26 1996
//35/bin/Net Net 4.25E Apr 24 2002
//35/bin/Net.ns83815 Net.ns83815 4.25C Oct 08 2002
//35/*/usr/ucb/Tcpip Tcpip 5.00A Jan 26 2001

Does anyone have any suggestions as to why this might occur, and/or what we
can do to prevent it or lessen the impact?

Thanks,


Ryan J. Allen

This will occur when a process calls blocking read() and there is no data to
be returned: process will remain REPLY-blocked until there is data
available. This can be avoided either by providing O_NONBLOCK flag to open()
or by using dev_read() function.

“Ryan J. Allen” <ocip-nntp@ocip.recurse.org> wrote in message
news:d8mld2$kl8$1@inn.qnx.com

In our product we talk to two external devices via serial ports. Our
controlling processes each open their serial device for read/write in
standard blocking mode, and communicate using the standard read(),
write(), open(), close() functions. Here are the flags used in our open
call (where device is the path to the serial port device):

fd = open(device, O_RDWR | O_NOCTTY);

We have, on occaision, seen these processes independently (ie, not
necessarily at the same time) being REPLY-blocked on the Dev32 process
indefinitely while the Dev32 process (and the Dev32.ser process) are
RECEIVE-blocked on 0. This suggests to us that a message has been sent by
our process to the Dev32 device (via the read or write function), and
received by Dev32, but our process did not receive a reply to that
message.

Here is the output of sin ver on the node we most recently noticed this
event:

PROGRAM NAME VERSION DATE
/boot/sys/Proc32 Proc 4.25L Feb 15 2001
/boot/sys/Proc32 Slib16 4.23G Oct 04 1996
/boot/sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24Y Apr 23 2002
/bin/Fsys.eide eide 4.25G Apr 15 2002
//35/bin/Dev32 Dev32 4.23G Oct 04 1996
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ser Dev.ser 4.25A Feb 14 2003
//35/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//35/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//35/bin/Pipe Pipe 4.23A Feb 26 1996
//35/bin/Net Net 4.25E Apr 24 2002
//35/bin/Net.ns83815 Net.ns83815 4.25C Oct 08 2002
//35/*/usr/ucb/Tcpip Tcpip 5.00A Jan 26 2001

Does anyone have any suggestions as to why this might occur, and/or what
we can do to prevent it or lessen the impact?

Thanks,


Ryan J. Allen

Alexander Koppel <akoppel@qnx.com> wrote:

This will occur when a process calls blocking read() and there is no data to
be returned: process will remain REPLY-blocked until there is data
available. This can be avoided either by providing O_NONBLOCK flag to open()
or by using dev_read() function.

It can, also, occur if the process calls write(), the serial port’s
outgoing data buffer is full, and it is unable to tx any data – for
instance if the serial port has been flow-controlled at the remote
end.

In both cases, this is normal and expected behaviour. In particular
check the documentation section of read where it starts into
a numbered list with the phrase, “When attempting to read from a
file (other than a pipe or FIFO) that supports nonblocking reads and
has no data currently available:”, and the similar section in the
write() documentation, for a description of the expected behaviour.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

David can you confirm for me QNX’s implementation of select? If select
returns for a read set it means there is at least one fd in the set that has
at least one byte available correct? And if we use select with a write set
it means that at least one of the write fds can hold one more byte (our
serial port in this case)?

Is there anyway to reset the serial port from code in the case where we quit
getting data from it?

Thanks

Larry

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:d8mqtd$nsl$2@inn.qnx.com

Alexander Koppel <> akoppel@qnx.com> > wrote:
This will occur when a process calls blocking read() and there is no data
to
be returned: process will remain REPLY-blocked until there is data
available. This can be avoided either by providing O_NONBLOCK flag to
open()
or by using dev_read() function.

It can, also, occur if the process calls write(), the serial port’s
outgoing data buffer is full, and it is unable to tx any data – for
instance if the serial port has been flow-controlled at the remote
end.

In both cases, this is normal and expected behaviour. In particular
check the documentation section of read where it starts into
a numbered list with the phrase, “When attempting to read from a
file (other than a pipe or FIFO) that supports nonblocking reads and
has no data currently available:”, and the similar section in the
write() documentation, for a description of the expected behaviour.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Lawrence R. Sweet <lsweet@fct.ca> wrote:

David can you confirm for me QNX’s implementation of select? If select
returns for a read set it means there is at least one fd in the set that has
at least one byte available correct?

Well, um, that’s the what the rules say, yup.

It just so happens that there is a bug (PR 22565 to be exact) that
describes a situation (that I found while doing some testing) with
devc-* drivers which can cause false wakeups for read, when there
is no data available.

For this to occur:
OPOST and ONLCR must be set.
A \n must be being processed on output (in particular the \n must
have been encountered, but the ‘manufactured’ \r is being processed
for tx), and a notfication request comes in for read at this time.
In this case, a false wakeup for read occurs.

[This situation isn’t quite as uncommon as it might sound…
write short command ending with a new line, do a printf() or
something else that takes a bit of time, then select() for
input could hit it.]

(Note, also, depending on terminal settings, you may not be woken
up as soon as a byte is available. e.g. it may wait for a full
line in edit mode.)

And if we use select with a write set
it means that at least one of the write fds can hold one more byte (our
serial port in this case)?

Can hold “at least one more byte”, yup.

Is there anyway to reset the serial port from code in the case where we quit
getting data from it?

Query it’s current state before making any changes; save that away;
restore it before quitting.

-David

P.S. I found the bug in 6.3.0SP1 pre-release testing. It looks like a
fix was implemented on May 25, 2005. I don’t expect it will make the
6.3.0 SP2 release – if you need the fix, you may have to go through
your technical support rep to get a support patch. (Or, I may have
given you enough information to work-around the bug, should it be this
bug that is biting you.)


David Gibbs
QNX Training Services
dagibbs@qnx.com

Thanks for the information David. This is a QNX 4 problem but we are
migrating so this problem is good to know about.

Larry

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:d8n668$3ki$1@inn.qnx.com

Lawrence R. Sweet <> lsweet@fct.ca> > wrote:
David can you confirm for me QNX’s implementation of select? If select
returns for a read set it means there is at least one fd in the set that
has
at least one byte available correct?

Well, um, that’s the what the rules say, yup.

It just so happens that there is a bug (PR 22565 to be exact) that
describes a situation (that I found while doing some testing) with
devc-* drivers which can cause false wakeups for read, when there
is no data available.

For this to occur:
OPOST and ONLCR must be set.
A \n must be being processed on output (in particular the \n must
have been encountered, but the ‘manufactured’ \r is being processed
for tx), and a notfication request comes in for read at this time.
In this case, a false wakeup for read occurs.

[This situation isn’t quite as uncommon as it might sound…
write short command ending with a new line, do a printf() or
something else that takes a bit of time, then select() for
input could hit it.]

(Note, also, depending on terminal settings, you may not be woken
up as soon as a byte is available. e.g. it may wait for a full
line in edit mode.)

And if we use select with a write set
it means that at least one of the write fds can hold one more byte (our
serial port in this case)?

Can hold “at least one more byte”, yup.

Is there anyway to reset the serial port from code in the case where we
quit
getting data from it?

Query it’s current state before making any changes; save that away;
restore it before quitting.

-David

P.S. I found the bug in 6.3.0SP1 pre-release testing. It looks like a
fix was implemented on May 25, 2005. I don’t expect it will make the
6.3.0 SP2 release – if you need the fix, you may have to go through
your technical support rep to get a support patch. (Or, I may have
given you enough information to work-around the bug, should it be this
bug that is biting you.)


David Gibbs
QNX Training Services
dagibbs@qnx.com

Lawrence R. Sweet <lsweet@fct.ca> wrote:

Thanks for the information David. This is a QNX 4 problem but we are
migrating so this problem is good to know about.

Doh. Forgot which newsgroup I was in.

Ok, looking at the code in Dev:

For input (read):

if(ctlp->ibuf->size || (ctlp->flags & HUP_DETECTED) != 0) {
set->flag |= _SEL_IS_INPUT;
++count;
}

else if(ctlp->cbuf->size != 0 || ctlp->cbuf->eol != ctlp->cbuf->buffer) {
set->flag |= _SEL_IS_INPUT;
++count;
}

ibuf would be input buffer – so if there is one or more bytes, or if a
HUP was detected, it seems that it would unblock the select.

cbuf is canonical buffer, looks like if there is data in cbuf, that could
also cause unblock. It appears to be a seperate check, probably based on
bytes being copied to the cbuf, possibly leaving ibuf empty, so needing to
be handled seperately.

For output (write):

if(ctlp->obuf->size < ctlp->obuf->length) {
set->flag |= _SEL_IS_OUTPUT;
++count;
}

It looks even simpler. If you got room for one or more bytes, you get
unblocked.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com