Serial Port Problem

My bad … I read the Modem Status register by mistake. The value for the
Line status register is:

0xE4

Bit 0 - Data Ready 0
Bit 1 - Overrun Error 0
Bit 2 - Parity Error 1
Bit 3 - Framing Error 0
Bit 4 - Break Interrupt 0
Bit 5 - Tx Holding Register 1
Bit 6 - Tx Empty 1
Bit 7 - Rx FIFO Error 1

Other than the parity error and Rx Fifo Error would this indicate a lost
Transmit interrupt?

Larry

“Evan Hillas” <evanh@clear.net.nz> wrote in message
news:db5eva$o2n$1@inn.qnx.com

Lawrence R. Sweet wrote:
We just caught the lockup. i.e alarm(1) triggered when trying to send an
8 byte message. Looking at the Line Status Register shows 0x63.


Bit 0 - Data Ready
Bit 1 - Overrun Error
Bit 2 - Parity Error
Bit 3 - Framing Error
Bit 4 - Break Interrupt
Bit 5 - Tx Holding Register
Bit 6 - Tx Empty
Bit 7 - Rx FIFO Error

Umm, Data Ready is set, why is the receive buffer not empty? I presume
you didn’t have data coming in when you probed the LSR. And the Overrun
Error shouldn’t have occurred either, this indicates the driver is not
gathering the incoming data.

Your driver looks very confused. It doesn’t appear to be talking to the
comport at all.


Even though we are not using any HW/SW flow control and we only have 3
wires for our RS485 connection. Is it possible that noise is being
picked up on one of the other modem control lines(pins) and the Driver is
flow controlling by mistake?


Nope. 232 line receivers are quite hardy and true 485 doesn’t have any
external handshake.

And even if it did manage to pickup a spike it would be too short for the
driver to see. The only chance of such an event having any effect at all
is if the UART generates an interrupt on the MSR and, even though the
delta bit is set, the driver would see the line state as a go so should
continue transferring.



Evan

Lawrence R. Sweet wrote:

My bad … I read the Modem Status register by mistake. The value for the
Line status register is:

0xE4

Bit 0 - Data Ready 0
Bit 1 - Overrun Error 0
Bit 2 - Parity Error 1
Bit 3 - Framing Error 0
Bit 4 - Break Interrupt 0
Bit 5 - Tx Holding Register 1
Bit 6 - Tx Empty 1
Bit 7 - Rx FIFO Error 1

Other than the parity error and Rx Fifo Error would this indicate a lost
Transmit interrupt?

Don’t know. Clearly there is no data being sent but that’s most likely because the driver is no longer wanting to send data rather than it waiting for the Tx Empty interrupt. Could be that the driver has stopped transferring because of the receive error. I mean both the parity and FIFO errors will be real, maybe caused by the same problem.


Evan

Evan Hillas wrote:

parity and FIFO errors will be real, maybe caused by the same problem.

Correction: … maybe caused by the one problem.

Hmm, Rx FIFO error just means there is a byte in the FIFO that had a receive error, so one can assume that parity was it. The question then becomes why is the receive FIFO not empty? Have you told the driver to use the FIFO? If not then I guess you should, so such data is always flushed.


Evan

Lawrence R. Sweet wrote:

This is 485 and most of my command messages are 8 bytes.

Need to know more than that, though. What is the protocol ?
Does it require an acknowledgement for each 8 byte packet ?
Does it have a window ? Is it asynchronous multidrop (i.e.
multiple outstanding requests) ?

Any of these would mean that the app could alarm() because
of the extremely slow transmission rate.

If you write 8 bytes, and then must receive an ack, then you
need to verify what the tx kick timeout is (I don’t have a
QNX4 system handy).

Rennie

Where do I find the tx kick timeout value?

“Rennie Allen” <rnogspamallen@comcast.net> wrote in message
news:db5qhv$2ph$1@inn.qnx.com

Lawrence R. Sweet wrote:
This is 485 and most of my command messages are 8 bytes.

Need to know more than that, though. What is the protocol ?
Does it require an acknowledgement for each 8 byte packet ?
Does it have a window ? Is it asynchronous multidrop (i.e.
multiple outstanding requests) ?

Any of these would mean that the app could alarm() because
of the extremely slow transmission rate.

If you write 8 bytes, and then must receive an ack, then you
need to verify what the tx kick timeout is (I don’t have a
QNX4 system handy).

Rennie

Lawrence R. Sweet wrote:

Where do I find the tx kick timeout value?

tx kick is 10*50ms (it’s tied to the -1 pseudo interrupt).

When a tx is requested, the character is loaded into the TX buffer, the
timeout count of 10 is set and then we’re done. If the pseudo handler
decrements the timeout count to zero it will kicks the TX (re enabling
the interrupts). When the TX interrupt occurs we disable the timeout
count, and then repeat.


Cheers,
Adam

QNX Software Systems
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

So can you speculate as to why the timeout seems not to be happening?

Larry

“Adam Mallory” <amallory@qnx.com> wrote in message
news:db6jtd$kp2$1@inn.qnx.com

Lawrence R. Sweet wrote:
Where do I find the tx kick timeout value?

tx kick is 10*50ms (it’s tied to the -1 pseudo interrupt).

When a tx is requested, the character is loaded into the TX buffer, the
timeout count of 10 is set and then we’re done. If the pseudo handler
decrements the timeout count to zero it will kicks the TX (re enabling the
interrupts). When the TX interrupt occurs we disable the timeout count,
and then repeat.


Cheers,
Adam

QNX Software Systems
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net

Lawrence R. Sweet wrote:

So can you speculate as to why the timeout seems not to be happening?

Have you established that? (check the tracelog) If so, then the kicks
aren’t occurring either because the timer tick interrupt isn’t counting
(thus not decrementing the timeout) or there isn’t a pending TX from the
drivers point of view (I think the LSR dump suggests that as well)

It might also be useful to connect a serial breakout box/LED display
inline to see what the line status is when things are ‘correct’ versus
when you see th e problem.

-Adam

Larry

“Adam Mallory” <> amallory@qnx.com> > wrote in message
news:db6jtd$kp2$> 1@inn.qnx.com> …

Lawrence R. Sweet wrote:

Where do I find the tx kick timeout value?

tx kick is 10*50ms (it’s tied to the -1 pseudo interrupt).

When a tx is requested, the character is loaded into the TX buffer, the
timeout count of 10 is set and then we’re done. If the pseudo handler
decrements the timeout count to zero it will kicks the TX (re enabling the
interrupts). When the TX interrupt occurs we disable the timeout count,
and then repeat.


Cheers,
Adam

QNX Software Systems
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net
\


Cheers,
Adam

QNX Software Systems
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Since the event happens so infrequently and at random times it is hard to
check the tracelog. The event might happen in the middle of the night and I
don’t check the log until the following day. I assume the tracelog would
have wrapped by then? Is there a wat to configure the trace so that it only
traps serial events?

I assume that because my code blocks forever in the write call that the
timeout is not happening in the driver. Is this an incorrect assumption?

Larry

“Adam Mallory” <amallory@qnx.com> wrote in message
news:db6l52$lea$1@inn.qnx.com

Lawrence R. Sweet wrote:
So can you speculate as to why the timeout seems not to be happening?

Have you established that? (check the tracelog) If so, then the kicks
aren’t occurring either because the timer tick interrupt isn’t counting
(thus not decrementing the timeout) or there isn’t a pending TX from the
drivers point of view (I think the LSR dump suggests that as well)

It might also be useful to connect a serial breakout box/LED display
inline to see what the line status is when things are ‘correct’ versus
when you see th e problem.

-Adam


Larry

“Adam Mallory” <> amallory@qnx.com> > wrote in message
news:db6jtd$kp2$> 1@inn.qnx.com> …

Lawrence R. Sweet wrote:

Where do I find the tx kick timeout value?

tx kick is 10*50ms (it’s tied to the -1 pseudo interrupt).

When a tx is requested, the character is loaded into the TX buffer, the
timeout count of 10 is set and then we’re done. If the pseudo handler
decrements the timeout count to zero it will kicks the TX (re enabling
the interrupts). When the TX interrupt occurs we disable the timeout
count, and then repeat.


Cheers,
Adam

QNX Software Systems
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net




\

Cheers,
Adam

QNX Software Systems
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net

Did you try this?


“Pavol Kycina” <xkycina@microstep-hdo.sk> wrote in message
news:42d60c22$1@news.microstep-hdo.sk

Try this short program (I call it wakeup_serial):


#include <stdio.h
#include <stdlib.h
#include <unistd.h
#include <errno.h
#include <conio.h

int main(int argc, char *argv[])
{
int i, go = 1, base;

if(argc < 2){
printf(“Nezadany ziadny argument\n”);
return(-1);
}

while(go){

for( i = 1; i < argc; i++){
base = strtol(argv> , NULL, 0);
if(base && errno != EOK){
outp(base + 1, 0x00);
outp(base + 1, 0x0f);
}
}

sleep (1);
}

return(0);
}





“Lawrence R. Sweet” <> lsweet@fct.ca> > wrote in message
news:db1530$g3e$> 1@inn.qnx.com> …
I have a Diamond Systems Athena CPU running a control application that
uses
one of the on port serial ports. This serial port (16550) is configured
for
raw mode and no hardware or software handshaking, 9600 baud, FIFOs not
enabled.

Ocassionally we have a situation where a write to the serial port will
block
forever. We added an alarm() function to pull us out of the write after
1
sec. We have determined that once we have time out (and reset the
alarm(0))
we can never talk to the port again. The only solution is to slay
Dev.ser
for that port. Once we do this we continue to run for some random
period
of
time (8 -10 hours) and then we will have another lockup.

Has anyone seen this type of behaviour before? The next time I
experience
the lockup I was going to read some of the UART registers to see if I
could
see something amiss. Is there any reason that the serial driver would
stop
accepting/transmitting characters?

Thanks for your help.

Larry


\

Not yet. I was hoping to find out the root cause of the problem before
putting a bandaid on it. If I can’t track it down soon what I will do is
reset the UART as you suggest after my alarm(1) triggers.

Thank you for your help.

Larry

“Pavol Kycina” <xkycina@microstep-hdo.sk> wrote in message
news:42d79f5d$1@news.microstep-hdo.sk

Did you try this?


“Pavol Kycina” <> xkycina@microstep-hdo.sk> > wrote in message
news:42d60c22$> 1@news.microstep-hdo.sk> …
Try this short program (I call it wakeup_serial):


#include <stdio.h
#include <stdlib.h
#include <unistd.h
#include <errno.h
#include <conio.h

int main(int argc, char *argv[])
{
int i, go = 1, base;

if(argc < 2){
printf(“Nezadany ziadny argument\n”);
return(-1);
}

while(go){

for( i = 1; i < argc; i++){
base = strtol(argv> , NULL, 0);
if(base && errno != EOK){
outp(base + 1, 0x00);
outp(base + 1, 0x0f);
}
}

sleep (1);
}

return(0);
}





“Lawrence R. Sweet” <> lsweet@fct.ca> > wrote in message
news:db1530$g3e$> 1@inn.qnx.com> …
I have a Diamond Systems Athena CPU running a control application that
uses
one of the on port serial ports. This serial port (16550) is
configured
for
raw mode and no hardware or software handshaking, 9600 baud, FIFOs not
enabled.

Ocassionally we have a situation where a write to the serial port will
block
forever. We added an alarm() function to pull us out of the write
after
1
sec. We have determined that once we have time out (and reset the
alarm(0))
we can never talk to the port again. The only solution is to slay
Dev.ser
for that port. Once we do this we continue to run for some random
period
of
time (8 -10 hours) and then we will have another lockup.

Has anyone seen this type of behaviour before? The next time I
experience
the lockup I was going to read some of the UART registers to see if I
could
see something amiss. Is there any reason that the serial driver would
stop
accepting/transmitting characters?

Thanks for your help.

Larry




\

Lawrence R. Sweet wrote:

Since the event happens so infrequently and at random times it is hard to
check the tracelog. The event might happen in the middle of the night and I
don’t check the log until the following day. I assume the tracelog would
have wrapped by then? Is there a wat to configure the trace so that it only
traps serial events?

tracelogger is your friend - it will log the traces to disk/file so you
can examine them later.

I assume that because my code blocks forever in the write call that the
timeout is not happening in the driver. Is this an incorrect assumption?

It’s likely, but it is also possible that kicking the UART is having no
effect or at least not the assumed effect of making forward progress.
On the other hand, if we never kick the UART, then it would seem that no
TX event is actually reaching the driver and getting stuck (at worst
eliminating a code path from future investigation).


Cheers,
Adam

QNX Software Systems
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

We have trapped another event. I measured the IRQ4 pin on the PC/104 Bus
and it was high. I am checking with Diamond Systems to see what that means.
I also added the kick code to reset the UART IER and this allowed our
application to continue.

I have a tracelog file but I’m not sure what to look for.

Larry

“Adam Mallory” <amallory@qnx.com> wrote in message
news:db8d11$1j5$1@inn.qnx.com

Lawrence R. Sweet wrote:
Since the event happens so infrequently and at random times it is hard to
check the tracelog. The event might happen in the middle of the night
and I don’t check the log until the following day. I assume the tracelog
would have wrapped by then? Is there a wat to configure the trace so
that it only traps serial events?

tracelogger is your friend - it will log the traces to disk/file so you
can examine them later.

I assume that because my code blocks forever in the write call that the
timeout is not happening in the driver. Is this an incorrect assumption?

It’s likely, but it is also possible that kicking the UART is having no
effect or at least not the assumed effect of making forward progress. On
the other hand, if we never kick the UART, then it would seem that no TX
event is actually reaching the driver and getting stuck (at worst
eliminating a code path from future investigation).


Cheers,
Adam

QNX Software Systems
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net

Yes, I did try this and it allowed my application to continue running.
Thank you for the suggestion.

Larry

“Pavol Kycina” <xkycina@microstep-hdo.sk> wrote in message
news:42d79f5d$1@news.microstep-hdo.sk

Did you try this?


“Pavol Kycina” <> xkycina@microstep-hdo.sk> > wrote in message
news:42d60c22$> 1@news.microstep-hdo.sk> …
Try this short program (I call it wakeup_serial):


#include <stdio.h
#include <stdlib.h
#include <unistd.h
#include <errno.h
#include <conio.h

int main(int argc, char *argv[])
{
int i, go = 1, base;

if(argc < 2){
printf(“Nezadany ziadny argument\n”);
return(-1);
}

while(go){

for( i = 1; i < argc; i++){
base = strtol(argv> , NULL, 0);
if(base && errno != EOK){
outp(base + 1, 0x00);
outp(base + 1, 0x0f);
}
}

sleep (1);
}

return(0);
}





“Lawrence R. Sweet” <> lsweet@fct.ca> > wrote in message
news:db1530$g3e$> 1@inn.qnx.com> …
I have a Diamond Systems Athena CPU running a control application that
uses
one of the on port serial ports. This serial port (16550) is
configured
for
raw mode and no hardware or software handshaking, 9600 baud, FIFOs not
enabled.

Ocassionally we have a situation where a write to the serial port will
block
forever. We added an alarm() function to pull us out of the write
after
1
sec. We have determined that once we have time out (and reset the
alarm(0))
we can never talk to the port again. The only solution is to slay
Dev.ser
for that port. Once we do this we continue to run for some random
period
of
time (8 -10 hours) and then we will have another lockup.

Has anyone seen this type of behaviour before? The next time I
experience
the lockup I was going to read some of the UART registers to see if I
could
see something amiss. Is there any reason that the serial driver would
stop
accepting/transmitting characters?

Thanks for your help.

Larry




\

Lawrence R. Sweet wrote:

We have trapped another event. I measured the IRQ4 pin on the PC/104 Bus
and it was high. I am checking with Diamond Systems to see what that means.
I also added the kick code to reset the UART IER and this allowed our
application to continue.

The fact the kick code allowed the application to continue (does the TX
message actually make it to the other side??) seems to suggest that
there is a pending TX interrupt, we just can’t ‘see’ it.

I have a tracelog file but I’m not sure what to look for.

using traceinfo , you can dump out the log in human readable
format. You should look for “Serial Port …” entries which should log
parity, frame/fifo, overrun, tx timeout(kick) etc events. Check out the
/etc/config/traceinfo file for more entries you may want to check out.

\

Cheers,
Adam

QNX Software Systems
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Adam Mallory wrote:

Lawrence R. Sweet wrote:

Where do I find the tx kick timeout value?


tx kick is 10*50ms (it’s tied to the -1 pseudo interrupt).

10*50ms == 500 ms per character * 8 byte messages == 4 seconds,
then the write is ripped out by the alarm() signal at 1 second.

Rennie

Because we are not using the FIFO we should get an interrupt on every single
byte tranferred. So alarm(1) should be fine no?

Larry

“Rennie Allen” <rnogspamallen@comcast.net> wrote in message
news:db8jf0$5pl$1@inn.qnx.com

Adam Mallory wrote:
Lawrence R. Sweet wrote:

Where do I find the tx kick timeout value?


tx kick is 10*50ms (it’s tied to the -1 pseudo interrupt).

10*50ms == 500 ms per character * 8 byte messages == 4 seconds,
then the write is ripped out by the alarm() signal at 1 second.

Rennie

Here is the single entry that I found in the tracelog:

Jul 15 10:48:53 2 00002003 Serial port 03F8, Overrun error

Any idea what this means?

Larry
“Adam Mallory” <amallory@qnx.com> wrote in message
news:db8jvp$6e5$1@inn.qnx.com

Lawrence R. Sweet wrote:
We have trapped another event. I measured the IRQ4 pin on the PC/104 Bus
and it was high. I am checking with Diamond Systems to see what that
means. I also added the kick code to reset the UART IER and this allowed
our application to continue.

The fact the kick code allowed the application to continue (does the TX
message actually make it to the other side??) seems to suggest that there
is a pending TX interrupt, we just can’t ‘see’ it.

I have a tracelog file but I’m not sure what to look for.

using traceinfo , you can dump out the log in human readable
format. You should look for “Serial Port …” entries which should log
parity, frame/fifo, overrun, tx timeout(kick) etc events. Check out the
/etc/config/traceinfo file for more entries you may want to check out.

\

Cheers,
Adam

QNX Software Systems
[ > amallory@qnx.com > ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <> pschon@baste.magibox.net

No, TX interrupts will only occur on a non-empty to empty transition. So
if the hardware fails to generate an interrupt you will never again be
notified that the hardware is ready to except another character. This is
what the kick timer is “fixing”. It will force the driver to write a
character to the hardware (if characters are available). This will get
things rolling again as the hardware will transmit the character and
then generate an interrupt to notify of the non-empty to empty transition.

If the kick timers period is longer then the alarm() timeout, then your
code will fail before the driver has an opportunity to recover from this
hardware failure.

Regards,

Joe

Lawrence R. Sweet wrote:

Because we are not using the FIFO we should get an interrupt on every single
byte tranferred. So alarm(1) should be fine no?

Larry

“Rennie Allen” <> rnogspamallen@comcast.net> > wrote in message
news:db8jf0$5pl$> 1@inn.qnx.com> …

Adam Mallory wrote:

Lawrence R. Sweet wrote:


Where do I find the tx kick timeout value?


tx kick is 10*50ms (it’s tied to the -1 pseudo interrupt).

10*50ms == 500 ms per character * 8 byte messages == 4 seconds,
then the write is ripped out by the alarm() signal at 1 second.

Rennie
\