Can anyone help explain what I am seeing here

Hi,

I have created a program that sends and receives data from one of our custom boards across a serial interface via a polling mechanism. The packet of data going to the board is 16 bytes and the packet I receive back is 20 bytes. The hardware board was designed to return data in 12.5 ms including the time to transmit data in both directions (38400 speed).

My program consists of 2 threads both running at priority 10:

Thread 1 sends to the board. It wakes up every 20ms, takes the mutex, checks to see if the last packet sent was replied to (and notes if it was not), increments some packet counters and then sends the next packet to the board and finally releases the mutex.

Thread 2 is the receiving thread. When it gets characters from the serial driver it takes the mutex, adds the characters to the receive buffer and checks to see if a complete packet was received. If a complete packet was received with no errors it marks the reply packet from the board as being successful. The mutex is then released.

Note there is a lot more going on in terms of validating packet data etc but that’s outside the scope of what I am seeing.

When I run my program, all is well. I’m using about 1-2% of the CPU and I’m not missing any packets. So far so good.

Now I added some stats gathering to my code and every 1000 packets I print out some data to a console under Photon (using the enchanced eide driver from EvanH but one that still does everything via CPU). When the stats print out, everything is fine until the data in the console scrolls. At that point, I print out 3-4 error messages about not receiving packets from my board. What I see confuses me and I hope someone can help explain what might be going on at the time the console scrolls.

In the messages I see below, I print out the number of packets I have sent from the sending thread, the number I have received in the receiving thread, the nanosecond time in the current second and the number of bytes I currently have for the next packet (remember I expect to get 20 bytes) from the serial driver.

processPollingTimeout() - Sent 4000 packets. Missing reply packet 4000 from board at time 440992799. Reply buffer has 10 bytes
processPollingTimeout() - Sent 4001 packets. Missing reply packet 4000 from board at time 450991269. Reply buffer has 12 bytes
processPollingTimeout() - Sent 4002 packets. Missing reply packet 4000 from board at time 465988974. Reply buffer has 14 bytes

So what I see is that I missed 3 packets. But what really puzzles me is that I see that 30ms has elapsed. So my timer fired 3 times and I sent 3 packets from the sending thread. In the mean time all I got back was 4 bytes (10 to 14) for the 1st of the 3 missing packets. This seems really strange to me as I can’t figure out why the serial driver is not sending data for this time period to my receiving thread.

Can anyone offer an idea of what might be happening during this 30 ms time frame and what if anything I can do about it? Note, I tried upping the priority of my receive thread to 11 but that makes no difference as I still miss data when the screen scrolls.

In the long run, the console won’t be on my final system but I am worried that when I add other processes their time slices might cause a similar affect on my system

TIA,

Tim

Well it really sounds like the code that scrolls the screen is running at a higher priority than your serial threads. Either that, or the driver is naughty and it turns off interrupts when scrolling. I would check the priorities carefully.

By enhanced eide driver do you really mean svga driver? (or vesa maybe?)

Photon drivers run at a high priority, normally 17, so if it is using CPU cycles to scroll then you’re app is going to be starved.

Only a few ideas:

  • If you are using standard 16550 UARTs, try to switch on hardware FIFO to avoid character loss under heavy load.
  • Use a seperate thread for printing messages (requires internal message queue)
  • Run the reader thread at high priority (21 or higher) and maintain an internal queue of received packets
  • Run the program in text console mode (no photon) and see if the problem persists

Btw, is it safe to use a character device from two different threads?

Regards,
Albrecht

CBurgess,

I believe I have the VESA driver. It’s been a long while since I set up my machine. At the time I installed 6.3, I downloaded the VESA driver that EvanH re-wrote to speed up screen updates/scrolling since my vid card was not supported (it’s called vesatweak).

I understand that the video driver can starve my app. What I don’t understand is that both my send and receive threads run at the same priority (10). I can see that the send thread woke up 3 times in a row (3 prints) without having gotten a complete reply packet from the serial driver. Since I printed out the number of characters I did receive and that it’s increasing I can see I am getting some CPU time in the receive thread. Just either not enuff to get a complete packet (which makes no sense) or the serial driver is not actually getting the bytes on the serial line.

Albrecht,

I’m going to try turning on the hardware FIFO and see if that helps. It sounds like the heavy load of the photon driver is causing loss of bytes or else the driver is turning off interrupts as Maschoen suggested.

Tim

I’m really not sure what is going on here. The serial driver usually runs at prio 24, it can be adjusted on the command line. It also maintains an internal receive buffer, so even if your receive thread is not fast enough it should get a bunch of characters when scrolling is done. I am not sure if the drivers mask only their own interrupts or switch off interrupts altogether. Maybe the problem is elsewhere.

Regards,
Albrecht

Albrecht,

The driver runs at Prio 24 but floats down to your apps priority when you connect to it. So it really is running at prio 10 (observed when I connect to it and then run pidin). The Doc’s say that it processes characters at prio 24 or interrupt time which I assume to mean characters from the UART and not characters I send/receive.

Your also right about the internal buffer being 2K for send and 2K for receiving so it should indeed buffer anything I don’t have time to grab during my time slice. All of which makes what I see quite strange unless interrupts are truly disabled by the VESA driver in which case characters will definitely be lost.

I’m going to spend some time this weekend doing further investigation on this to see if I can narrow down what’s going on.

Tim

P.S. How do you switch on the hardware FIFO? I didn’t see an option for that anywhere in the doc’s for devc-ser8250.

devc-ser8250 - Serial driver for 8250’s

devc-ser8250 [options] [port[^shift][,irq]] &
Options:
-b number Define initial baud rate (default 57600)
-c clk[/div] Set the input clock rate and divisor
-C number Size of canonical input buffer (default 256)
-e Set options to “edit” mode
-E Set options to “raw” mode (default)
-I number Size of raw input buffer (default 2048)
-f Enable hardware flow control (default)
-F Disable hardware flow control
-O number Size of output buffer (default 2048)
-s Enable software flow control
-S Disable software flow control (default)
-t number Enable recieve FIFO and set receive FIFO trigger level
-T number Enable transmit FIFO and set transmit FIFO size
-u unit Set serial unit number (default 1)

After spending a few hours on the weekend troubleshooting this I finally found my answer.

It has to do with how the serial interface to the hardware boards work. I wasn’t aware we were using uni-directional RS 485 (the 232 serial cable goes into a uni-directional 485 converter that connects to the board) until I chatted to one of the board designers this weekend. So for the communication to work properly, obviously only one side can send at a time. In normal opertion (no screen scrolling) the 20 ms between polling sends is plenty of time for a packet to be sent and a response to be received. But once the VESA driver enters the equation gobbling up CPU time, my sends get out of wack because they are at Prio 10 and can be starved so the result is when I do get CPU time I may attempt to send while the board is trying to reply and nothing gets through.

In the final product we won’t have a GUI but it does tell me that I need to make sure the priority of my sends is high enough that it never gets interrupted by something else.

Tim