I sent this message to the support people at qnx.com, but I’m also
posting it here in case anyone has some insights.
My group is trying to test the performance of QNX. We intend to use QNX
in a PPC based microcontroller that will communicate with a number of
attatched devices over ethernet. We do not currently have such a
configuration ready for test, so the testing is as follows:
I have a 400 Mhz Pentium II desktop machine and my co-worker has a 1.2
Ghz Celeron desktop machine. These are running QNX 6.2. Each machine has
a Netgear 83815 ethernet card. The two machines are connected by an
ethernet crossover cable between the two Netgear cards.
We have written two fairly simple C programs: the User App (UA) and the
Data Query Engine (DQE). The UA presents a small menu of commands to the
user. When a command is selected, the UA sends a message (using
MsgSend()) to the DQE to do something. One such command is to tell the
DQE to send a packet over the ethernet to the other machine. The DQE on
the other machine receives the packet, and immediately re-sends the
packet back to the first machine.
In order to test timing, we pass timing info along in the ethernet
packet. At the very beginning of program execution, the DQE calls
ClockPeriod() to be able to measure time with the finest granularity
possible, which is 9219 nanoseconds. (We call ClockPeriod with a value
of 10000 nanoseconds as per the documentation, but the actual value that
it gets set to is 9219.) When the DQE on the first machine sends a
packet to the other machine, it calls ClockTime() to store the current
time in the packet. When it gets the packet back from the other machine,
it can compare that time with the current time to measure the round-trip
time. We also do similar timing of the message sending; the UA stores
the current time in the message it sends in the DQE, and the DQE then
measures how long it took for the message to arrive.
The time measurements are accumulated, and we keep track of min, max and
average times, as well as a simple histogram of times. We have separate
sets of data for message passing times and packet round trip times.
Here’s the problem: the results are not consistently fast enough for our
needs. We want the packet round-trip time to be consistently less than
100 usec (100000 nsec). We have 2 commands in the UA that send timed
packets. One sends 1 timed packet, and the other sends 10,000 timed
packets in a for loop, with a call to nap() following each send. We have
tested the latter command with nap() values of both 1 and 10 msec, with
similar results.
When using the command that sends only one packet, the round trip time
is usually 212037 ns (~212 us). When using the command that sends 10,000
packets, most of the packets get back in only 82971 ns (~83 us).
However, somewhere around 24-27 of the 10,000 packets take between 100
and 200 usec to arrive.
I am wondering why sending one packet takes longer than most packets
sent in the loop of 10000. My boss suspects it has something to do with
QNX scheduling time between other tasks with the same priority. We first
did this test with QNX running the full Photon environment, and got
worse results than those described above. Sometimes the packet would
take as long as ~700 usec to get back, and on rare occasions we would
get a max time of over 1 msec (1000 usec)! After switching to using a
text mode environment on both machines, we get the described results.
I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot
Also, we get a similar problem with the message send speed, but on a
smaller scale. The messages usually get sent in 1 or 2 clock cycles
(9219 or 18438 ns), but the max time for 10000 message is often 3 or 4
clock cycles.
So here are my questions:
Assuming our test code isn’t doing anything obviously stupid (which
might be a big assumption ), where should we be looking to improve
response time?
My boss suggested that the problem is that QNX is preempting the DQE
process before it sends the packet, which keeps it from being sent
exactly when we want it to. He further suggested that when we’re running
on the real intended hardware, which is a PPC 8245 microcontroller card
running at 66 Mhz, we could use the hardware timer to schedule the
packet sends in an interrupt handler rather than a for loop, and that
this would avoid the problem. Does this seem to make sense?
Thanks in advance for any insights.