UART Driver performance

Hi,

We are working on benchmarking the performance of some QNX drivers for Cortex-A7 based system.
For serial UART driver the performance numbers for transmitting large chunk of bytes is slightly on the lower side.
We find that there is some delay introduced by our driver in the total time taken to transfer the bytes.
Here are the numbers we see while transferring 40000 Bytes.

Size Time
40000 3.45 secs

On the same board Linux gives better performance in transmitting 40000 bytes.
Size Linux
40000 2.90 secs

The system is clocked at 600 MHz and the UART Line characteristic is 115200 8-N-1 for both test scenarios.

Any suggestions to improve the performance of UART driver in QNX system would be of help to us.
Can somebody also share the performance numbers for the drivers of QNX systems.

Regards
Anant Pai

How are you getting data to your serial driver? Are you reading the 40K bytes from disk or are they already in RAM and ready to pass to the driver? If they are not in RAM you might consider that the disk I/O could be a factor.

Also what other parameters did you give your serial driver? The default driver (probably not the one you are using) has a 2K input and output buffer. You might try increasing the size to 32 or even 40K.
qnx.com/developers/docs/6.3. … r8250.html

Tim

Hi Tim,

Thanks for your reply.

The 40K byte array is located in DDR and is allocated statically as a global buffer in the test application.
The test application calls write call to transfer the bytes using the file descriptor.
The default ibuf and obuf sizes were 2k. I tried with increasing the sizes to 10k but the numbers didnt change.
Will give it a try with 32k and 40k.

One more observation is the delay is increasing/decreasing as a function of baud rate so at 57600 baud rate the delay is more compared with 115200 baud.
In ideal scenario the time taken to transmit 40k bytes at 57600 should be around 5.7 secs we observe 6.9 seconds with our driver.

Regards
Anant Pai

A few comments.

You should check to make sure that the driver is making proper use of the uart FIFO. The place in the FIFO where an interrupt occurs is programmable and it could be the default for the driver is to minimize latency instead of maximizing throughput.

Make sure that all unused tty features are turned off.

Do not use putc() or putchar(). Instead use either fwrite() or better try using the unbuffered open()/write() calls.

That said it might help to know how QNX does things different than Linux. With linux the write buffers is probably passed to the driver as a pointer so data can be loaded directly to the hardware. With QNX a message with the data is passed to the driver with a message. My intuition is that at 115200 baud with a 600Mhz processor, this might be an issue, but your report that with 57600 baud the delay is worse indicates otherwise.

I’d be interested to know what you find.

One other thing to ask about.

How are you connected to your board? Do you have a keyboard + monitor directly connected to it or are you sending/receiving data+keystrokes from a serial connection (ie remote debugging)? If it’s the latter then it could be that connection to your board that slows things down slightly.

Tim

Tim, Maschoen,

Our target board is connected to the host through a FTDI chip on serial over USB connection.
We are running the test application on the target which transmits the 40k bytes continuously to the host machine.

The FIFO is enabled and with different watermark level settings the performance does not show any difference.

As suggested we increased the ibuf/obuf size to 10k with this change we did see some difference in performance measured.

For all the time measurements we are using the time shell utility provided by ksh.
We run the performance test application with the following command from the shell to measure the time taken by the test app.
Ex: time uart-perf

One interesting observation is that if we use a clock_gettime API to calculate the time taken by the write API call to transmit the buffer
the time taken is about 2.6 seconds.
Below is the code snippet from the test application we are using to measure the performance.
if( clock_gettime( CLOCK_REALTIME, &start) == -1 ) {
perror( “clock gettime” );
return EXIT_FAILURE;
}
write(fd, gTxBuf, 40000);
if( clock_gettime( CLOCK_REALTIME, &stop) == -1 ) {
perror( “clock gettime” );
return EXIT_FAILURE;
}
accum = ( stop.tv_sec - start.tv_sec )
+ (double)( stop.tv_nsec - start.tv_nsec )
/ (double)BILLION;
printf( “\n\n\n Time taken is %lf\n”, accum );

This way we see the time taken is about 2.6 seconds.
But the time shell command continues to give 3.45 seconds as the time taken to execute the test application.
This difference in timings of about 0.9 seconds is what we are not able to account for. This delay becomes dependent on the baud rate it increases if we reduce the baud rate value.

To rule out the possibility of the delay caused by test application we introduced a delay of 500ms instead of write API call then the time shell utility value and clockgettime value were almost exactly matching.

Interestingly a similar test carried out in linux gives same values from time shell command and clock_gettime API with minimum delta.

Regards
Anant

Anant,

   I'm suspicious about using the "time" command here.   The obvious question is how is "time" measuring cpu time vs. real time?   Though there may be one, I've never seen a QNX interface that provides cpu time used by a process.   There are a number of complications on what this would mean.   For example a process could have multiple threads on a multi-core system and use more cpu time than real time.   Also with QNX, there is the question of how to account for a server providing service.   On a linux system to do a system call, the user application jumps into the kernel, but I think the meter continues to run for the user.   On QNX this can't happen.

  Here are two simple ways to figure out if this is the issue.    Write a simple program that does the following:

get-starting-time
run test-program
get-ending-time
report total time

Increase the amount of data you send in the test by a factor of 10. Then use a stop watch.

Anant,

I hope you understand that when you do a “time myProgram” from the shell that the time that gets reported is the total time of execution of myProgram. This time includes the time needed to load the program into memory, the time needed to execute the program and then the time to exit.

If your program resides on a slow speed physical device (USB, floppy) or even a medium speed physical device (harddrive) it’s going to report slower times than a fast physical device (SSD or even better a RAM drive). Is it possible that your Linux machine has much faster load times of your program than your board?

Also is your program printing anything to the screen might add more time?

The clock_gettime() command is the most accurate way to show how long just the write() command took. Since it reports 2.6 seconds I’d say that’s how long it’s taking. The other .9 seconds is probably load time from the physical medium. You could verify if this is the case by creating a RAM drive:

  1. include devb-ram in your boot image.
  2. create a 5 meg ram drive with ‘devb-ram capacity=10000 &’
  3. mount the RAM drive with ‘mount -t qnx4 /dev/hd1t77 /fs/ram’
  4. Copy your program to the ram drive ‘cp myProgram /fs/ram’
  5. Run from the ram drive with ‘time /fs/ram/myProgram’

Tim