QNX response time issues

Karl_von_Laudermann · December 16, 2002, 9:15pm

I sent this message to the support people at qnx.com, but I’m also
posting it here in case anyone has some insights.

My group is trying to test the performance of QNX. We intend to use QNX
in a PPC based microcontroller that will communicate with a number of
attatched devices over ethernet. We do not currently have such a
configuration ready for test, so the testing is as follows:

I have a 400 Mhz Pentium II desktop machine and my co-worker has a 1.2
Ghz Celeron desktop machine. These are running QNX 6.2. Each machine has
a Netgear 83815 ethernet card. The two machines are connected by an
ethernet crossover cable between the two Netgear cards.

We have written two fairly simple C programs: the User App (UA) and the
Data Query Engine (DQE). The UA presents a small menu of commands to the
user. When a command is selected, the UA sends a message (using
MsgSend()) to the DQE to do something. One such command is to tell the
DQE to send a packet over the ethernet to the other machine. The DQE on
the other machine receives the packet, and immediately re-sends the
packet back to the first machine.

In order to test timing, we pass timing info along in the ethernet
packet. At the very beginning of program execution, the DQE calls
ClockPeriod() to be able to measure time with the finest granularity
possible, which is 9219 nanoseconds. (We call ClockPeriod with a value
of 10000 nanoseconds as per the documentation, but the actual value that
it gets set to is 9219.) When the DQE on the first machine sends a
packet to the other machine, it calls ClockTime() to store the current
time in the packet. When it gets the packet back from the other machine,
it can compare that time with the current time to measure the round-trip
time. We also do similar timing of the message sending; the UA stores
the current time in the message it sends in the DQE, and the DQE then
measures how long it took for the message to arrive.

The time measurements are accumulated, and we keep track of min, max and
average times, as well as a simple histogram of times. We have separate
sets of data for message passing times and packet round trip times.

Here’s the problem: the results are not consistently fast enough for our
needs. We want the packet round-trip time to be consistently less than
100 usec (100000 nsec). We have 2 commands in the UA that send timed
packets. One sends 1 timed packet, and the other sends 10,000 timed
packets in a for loop, with a call to nap() following each send. We have
tested the latter command with nap() values of both 1 and 10 msec, with
similar results.

When using the command that sends only one packet, the round trip time
is usually 212037 ns (~212 us). When using the command that sends 10,000
packets, most of the packets get back in only 82971 ns (~83 us).
However, somewhere around 24-27 of the 10,000 packets take between 100
and 200 usec to arrive.

I am wondering why sending one packet takes longer than most packets
sent in the loop of 10000. My boss suspects it has something to do with
QNX scheduling time between other tasks with the same priority. We first
did this test with QNX running the full Photon environment, and got
worse results than those described above. Sometimes the packet would
take as long as ~700 usec to get back, and on rare occasions we would
get a max time of over 1 msec (1000 usec)! After switching to using a
text mode environment on both machines, we get the described results.

I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot

Also, we get a similar problem with the message send speed, but on a
smaller scale. The messages usually get sent in 1 or 2 clock cycles
(9219 or 18438 ns), but the max time for 10000 message is often 3 or 4
clock cycles.

So here are my questions:

Assuming our test code isn’t doing anything obviously stupid (which
might be a big assumption ), where should we be looking to improve
response time?

My boss suggested that the problem is that QNX is preempting the DQE
process before it sends the packet, which keeps it from being sent
exactly when we want it to. He further suggested that when we’re running
on the real intended hardware, which is a PPC 8245 microcontroller card
running at 66 Mhz, we could use the hardware timer to schedule the
packet sends in an interrupt handler rather than a for loop, and that
this would avoid the problem. Does this seem to make sense?

Thanks in advance for any insights.

Rennie_Allen2 · December 16, 2002, 9:15pm

Karl von Laudermann wrote:

I am wondering why sending one packet takes longer than most packets
sent in the loop of 10000. My boss suspects it has something to do with
QNX scheduling time between other tasks with the same priority.

How are you sending these packets ? You aren’t using TCP/IP are you ?

I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot >

If you want to get good results you have to run the program at high
priority. If you want your keyboard back, just have the test run
a fixed number of cycles, and then quit.

My boss suggested that the problem is that QNX is preempting the DQE
process before it sends the packet, which keeps it from being sent
exactly when we want it to.

If your not running the DQE at the highest priority, you are telling
QNX that it is free to pre-empt you anytime a higher priority thread
wants to run.

He further suggested that when we’re running
on the real intended hardware, which is a PPC 8245 microcontroller card
running at 66 Mhz, we could use the hardware timer to schedule the
packet sends in an interrupt handler rather than a for loop, and that
this would avoid the problem. Does this seem to make sense?

Well, it would “avoid the problem”. I think a better solution would be
to set the priorities correctly so that resorting to unschedulable
entities (i.e. interrupts) is not necessary.

David_Gibbs1 · December 16, 2002, 10:13pm

Karl von Laudermann <karl@nospam.ueidaq.com> wrote:

I sent this message to the support people at qnx.com, but I’m also
posting it here in case anyone has some insights.

In order to test timing, we pass timing info along in the ethernet
packet. At the very beginning of program execution, the DQE calls
ClockPeriod() to be able to measure time with the finest granularity
possible, which is 9219 nanoseconds. (We call ClockPeriod with a value
of 10000 nanoseconds as per the documentation, but the actual value that
it gets set to is 9219.)

Ouch. That introduces considerable interrupt overhead right there.
You are asking for a timer interrupt every 10 usecs, and doing the
work involved in that interrupt far more frequently that is normal.

You might try using ClockCycles() for your timestamping – on
x86 it does a rtdsc opcode to get a free-running counter. This
is far less intrusive to system performance.

When the DQE on the first machine sends a
packet to the other machine, it calls ClockTime() to store the current
time in the packet. When it gets the packet back from the other machine,
it can compare that time with the current time to measure the round-trip
time. We also do similar timing of the message sending; the UA stores
the current time in the message it sends in the DQE, and the DQE then
measures how long it took for the message to arrive.

Assuming the UA and DQE are on the same machine, this could also
be done with ClockCycles. If the UA and DQE are on different machines,
neither the time value, nor the ClockCycles() value would likely
be closely enough synched to be of use for comparison.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

system · December 16, 2002, 10:16pm

Karl von Laudermann <karl@nospam.ueidaq.com> wrote:

My boss suggested that the problem is that QNX is preempting the DQE
process before it sends the packet, which keeps it from being sent
exactly when we want it to. He further suggested that when we’re running
on the real intended hardware, which is a PPC 8245 microcontroller card
running at 66 Mhz, we could use the hardware timer to schedule the
packet sends in an interrupt handler rather than a for loop, and that
this would avoid the problem. Does this seem to make sense?

There are a couple of things to note.

make sure the clocks on both machines are synchronized. The PC
hardware is notoriously horribile for drift on the RTC.
Photon is not realtime. Do not put realtime / time critical
code in a photon app.

You can’t make kernel calls from within an ISR, and doing a MsgSend()
is a kernel call, so your boss’ solution won’t work.

One of the trickiest things in building a real-time system is that of
getting the priorities for all the processes so that the right things
get the CPU at the right time, and the right things pre-empt the less
critial parts.

I would suggest the following approach:

create a resmgr that handles the DQE portion and has no user interface
whatsoever
do the same for the other side of the connection
create a set of devctl() structures that can read the timing information
from a buffer within your programs
create a UI or GUI program that uses the devctl() calls to obtain the
timing information in near real-time (as opposed to real-time).
run the communications programs at a higher priority than 10
run the UI/GUI at a priority of 10

Cheers,
Camz.

Mario_Charest1 · December 17, 2002, 1:10pm

I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot >

That indicated you program is busy waiting and that’s bad. When you send
messages over ethernet a program will actally spend quite a bit of time
doing nothing. Thus shell should have plenty of CPU time left to run.

If you can, post the code of your test program

Karl_von_Laudermann · December 17, 2002, 5:02pm

David Gibbs wrote:

Ouch. That introduces considerable interrupt overhead right there.
You are asking for a timer interrupt every 10 usecs, and doing the
work involved in that interrupt far more frequently that is normal.

You might try using ClockCycles() for your timestamping – on
x86 it does a rtdsc opcode to get a free-running counter. This
is far less intrusive to system performance.

Interesting. I wouldn’t have thought that the overhead from the clock
would interfere. I will change it to use ClockCycles instead, thanks.

Assuming the UA and DQE are on the same machine, this could also
be done with ClockCycles. If the UA and DQE are on different machines,
neither the time value, nor the ClockCycles() value would likely
be closely enough synched to be of use for comparison.

Yes, the UA and DQE are on the same machine. This is why the message
passing can be timed one way (from UA to DQE), but the packet travel
must be timed for a two-way trip, since the time difference can only be
calculated on the machine it originated from.

Karl_von_Laudermann · December 17, 2002, 5:06pm

camz@passageway.com wrote:

I would suggest the following approach:

create a resmgr that handles the DQE portion and has no user interface
whatsoever

do the same for the other side of the connection

create a set of devctl() structures that can read the timing information
from a buffer within your programs

create a UI or GUI program that uses the devctl() calls to obtain the
timing information in near real-time (as opposed to real-time).

run the communications programs at a higher priority than 10

run the UI/GUI at a priority of 10

Ok, I’ll look into this, thanks.

Right now, the DQE isn’t a resmgr, but it has no interface, which is why
there is the separate UA program. And SendMsg is the way they communicate.

Karl_von_Laudermann · December 17, 2002, 5:15pm

Rennie Allen wrote:

I am wondering why sending one packet takes longer than most packets
sent in the loop of 10000. My boss suspects it has something to do
with QNX scheduling time between other tasks with the same priority.

How are you sending these packets ? You aren’t using TCP/IP are you ?

I’m not intimately familiar with the networking code, since it’s a more
generalized piece that was preexisting. But I believe it’s just plain
ethernet with our raw data inside, not using TCP/IP.

If you want to get good results you have to run the program at high
priority. If you want your keyboard back, just have the test run
a fixed number of cycles, and then quit.

Yeah, the priority thing seems to be a common theme . I’ll definitely
experiment with boosting the priority.

If your not running the DQE at the highest priority, you are telling
QNX that it is free to pre-empt you anytime a higher priority thread
wants to run.

Doing a ps -e brings up lots of processes that are at the same
priority that the DQE runs at by default (10), but only a couple that
are higher priority than that. So I would guess that increasing the
priority even by just 1 will give me some useful info. And, currently,
lock up my machine as well.

Karl_von_Laudermann · December 17, 2002, 5:17pm

Mario Charest wrote:

That indicated you program is busy waiting and that’s bad. When you send
messages over ethernet a program will actally spend quite a bit of time
doing nothing. Thus shell should have plenty of CPU time left to run.

Yeah, it’s looking like the way to go is to experiment with priorities,

after figuring out where the prog is busy waiting.

If you can, post the code of your test program

Unfortunately, the actual networking code is preexisting code that
contains company confidential bits. Extracting out the relevant stuff
that’s used only for this test and hiding the details would take some work.

Brown_Richard · December 17, 2002, 5:49pm

Doing a ps -e brings up lots of processes that are at the same
priority that the DQE runs at by default (10), but only a couple that
are higher priority than that. So I would guess that increasing the
priority even by just 1 will give me some useful info. And, currently,
lock up my machine as well. >

I believe you can increase/decrease the priority of processes using slay -P

pid or nice. So before changing the priority of your apps boost the priority
of your shell. Then boost the priority of your apps being sure NOT to boost
it above your shell.

David_Gibbs1 · December 17, 2002, 7:50pm

Karl von Laudermann <karl@nospam.ueidaq.com> wrote:

David Gibbs wrote:
Ouch. That introduces considerable interrupt overhead right there.
You are asking for a timer interrupt every 10 usecs, and doing the
work involved in that interrupt far more frequently that is normal.

You might try using ClockCycles() for your timestamping – on
x86 it does a rtdsc opcode to get a free-running counter. This
is far less intrusive to system performance.

Interesting. I wouldn’t have thought that the overhead from the clock
would interfere. I will change it to use ClockCycles instead, thanks.

At a 1ms tick, you are getting 1000 interrupts a second. At a
10usec ticksize (clock resolution) you are get 100,000 interrupts
a second. That is a 100-fold increase in the interrupt rate from
the timer interrupt. Yes, this will impose more overhead. If you
are performing an operation that takes, say, 100 us, then with a
1ms ticksize, approximately every tenth operation will get interrupted
(once only), with the overhead of the checkpoint of that operation,
switch to, handle interrupt, then resume. With a 10 us timer frequency,
the operation will get interrupt about 10 times (maybe more…as the
operation is extended by the interruptions), with the requisite
overhead of save/restore state (or restart operation).

So, I would definitely expect higher timer rates to slow-down system
throughput, and increase the average, and maximum, latency for operations.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Karl_von_Laudermann · December 17, 2002, 8:23pm

David Gibbs wrote:

You might try using ClockCycles() for your timestamping – on
x86 it does a rtdsc opcode to get a free-running counter. This
is far less intrusive to system performance.

Aaaargh! This isn’t working!

I wrote a test app to use ClockCycles(), which does the following: It
prints the cycles per second specified in
SYSPAGE_ENTRY(qtime)->cycles_per_sec (this is where the ClockCycles()
documentation says to get it from). The value thus printed is 401029900
cycles per second. It then calls ClockCycles() to get the start time,
sleeps for a second, then calls ClockCycles() again to get the end time.
It then prints the time difference in microseconds using:

(newTime - oldTime) * 1000000 / cyclesPerSec;

This all works fine. I consistently get a time difference of ~1001300
microseconds.

Then when I changed my actual UA and DQE test programs to use
ClockCycles() to test timing, I’m getting seemingly random values for
the start and end times. I mean, I’m measuring time differences of less
than one second, but the start and end times vary by a couple of orders
of magnitude more than 401029900.

I’m reading the start and end times from different processes. This
shouldn’t matter if ClockTime() just reads a hardware clock. It
shouldn’t be process or thread specific, right? Not that the start times
are all similar to each other nor the end times similar to each other
anyway, which would occur if there were separate process specific clocks.

Grrr. What am I doing wrong?

Igor_Kovalenko2 · December 17, 2002, 8:30pm

Yup, this is a very bad sign. If something in your system can’t run with
higher than shell priority without locking the system, that something does
bad things ™. Still, you can run yet another shell on another console
with even higher priority so you retain access to the system. I have a habit
of always running a shell with priority 63 on one of the consoles, when
doing such experiments (on -t/dev/conN -p63 sh). Just in case

I would also try to use UDP instead of QNET and compare the behavior.

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:atn76i$97s$1@inn.qnx.com…

I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot >

That indicated you program is busy waiting and that’s bad. When you send
messages over ethernet a program will actally spend quite a bit of time
doing nothing. Thus shell should have plenty of CPU time left to run.

If you can, post the code of your test program

Igor_Kovalenko2 · December 17, 2002, 8:38pm

I don’t remember exactly, but some of that stuff is 64bit, afair. One common
trouble with those things is not using 64bit operands and/or printf format
strings where needed

“Karl von Laudermann” <karl@nospam.ueidaq.com> wrote in message
news:ato0i0$7c2$1@inn.qnx.com…

David Gibbs wrote:

You might try using ClockCycles() for your timestamping – on
x86 it does a rtdsc opcode to get a free-running counter. This
is far less intrusive to system performance.

Aaaargh! This isn’t working!

I wrote a test app to use ClockCycles(), which does the following: It
prints the cycles per second specified in
SYSPAGE_ENTRY(qtime)->cycles_per_sec (this is where the ClockCycles()
documentation says to get it from). The value thus printed is 401029900
cycles per second. It then calls ClockCycles() to get the start time,
sleeps for a second, then calls ClockCycles() again to get the end time.
It then prints the time difference in microseconds using:

(newTime - oldTime) * 1000000 / cyclesPerSec;

This all works fine. I consistently get a time difference of ~1001300
microseconds.

Then when I changed my actual UA and DQE test programs to use
ClockCycles() to test timing, I’m getting seemingly random values for
the start and end times. I mean, I’m measuring time differences of less
than one second, but the start and end times vary by a couple of orders
of magnitude more than 401029900.

I’m reading the start and end times from different processes. This
shouldn’t matter if ClockTime() just reads a hardware clock. It
shouldn’t be process or thread specific, right? Not that the start times
are all similar to each other nor the end times similar to each other
anyway, which would occur if there were separate process specific clocks.

Grrr. What am I doing wrong?

Karl_von_Laudermann · December 17, 2002, 8:49pm

Karl von Laudermann wrote:

David Gibbs wrote:

You might try using ClockCycles() for your timestamping – on x86 it
does a rtdsc opcode to get a free-running counter. This
is far less intrusive to system performance.

Aaaargh! This isn’t working!

Nevermind, I found the problem, it was user error of course

The stupid compiler doesn’t warn you when you call a function that isn’t
declared. I wrapped the ClockCycles() call in another function, and the
…h file that declared it wasn’t #included everywhere that needed it. So
the compiler assumed the function was returning an int rather than a
uint64_t.

Sigh

Igor_Kovalenko2 · December 18, 2002, 7:56am

Try --ansi-pedantic, i think you will get warning for missing prototypes
then.

“Karl von Laudermann” <karl@nospam.ueidaq.com> wrote in message
news:ato236$8ut$1@inn.qnx.com…

Karl von Laudermann wrote:
David Gibbs wrote:

You might try using ClockCycles() for your timestamping – on x86 it
does a rtdsc opcode to get a free-running counter. This
is far less intrusive to system performance.

Aaaargh! This isn’t working!

Nevermind, I found the problem, it was user error of course >

The stupid compiler doesn’t warn you when you call a function that isn’t
declared. I wrapped the ClockCycles() call in another function, and the
.h file that declared it wasn’t #included everywhere that needed it. So
the compiler assumed the function was returning an int rather than a
uint64_t.

Sigh

Armin_Steinhoff1 · December 18, 2002, 11:41am

Karl von Laudermann wrote:

I sent this message to the support people at qnx.com, but I’m also
posting it here in case anyone has some insights.

[ clip …]
I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot >

If I understood right … the DQE works as server process.
Such processes should wait suspended in a receive statement for messages
… but your DQE seems to do some kind of very ‘busy waiting’ (polling,
which seems to be the problem.

In order to increase the priority of the message oriented DQE …
increase just the priority of your clients. DQE will inherit the
priority of its clients as long as you not change the default behavior
of the channels of the DQE server.

But (as discussed in an other thread) … your threads shouldn’t
pre-empt QNET or the network driver.

Armin

Also, we get a similar problem with the message send speed, but on a
smaller scale. The messages usually get sent in 1 or 2 clock cycles
(9219 or 18438 ns), but the max time for 10000 message is often 3 or 4
clock cycles.

So here are my questions:

Assuming our test code isn’t doing anything obviously stupid (which
might be a big assumption > > ), where should we be looking to improve
response time?

My boss suggested that the problem is that QNX is preempting the DQE
process before it sends the packet, which keeps it from being sent
exactly when we want it to. He further suggested that when we’re running
on the real intended hardware, which is a PPC 8245 microcontroller card
running at 66 Mhz, we could use the hardware timer to schedule the
packet sends in an interrupt handler rather than a for loop, and that
this would avoid the problem. Does this seem to make sense?

Thanks in advance for any insights.

Karl_von_Laudermann · December 18, 2002, 9:26pm

Ok, we’ve made some progress in our test programs since I first posted.
I took David Gibbs’ suggestion of using ClockCycles() instead of
ClockTime() to measure times, and this resulted in a performance
improvement.

Since several people mentioned priorities, we turned our attention to
this issue next. We figured that the reason we were seeing longer packet
travel times when telling the UA to tell the DQE to send only one packet
vs. sending 10,000 packets was that the scheduler was taking time away
from the DQE to give it to the UA, and that running the DQE at a higher
priority than the UA would solve this. I first tested this hypothesis by
merely adding a nap(1) call to the UA right after it issues the “send
one packet” command to the DQE, and this worked. The packet trip time
measured shorter.

Mario Charest wrote:

I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot >

That indicated you program is busy waiting and that’s bad. When you send
messages over ethernet a program will actally spend quite a bit of time
doing nothing. Thus shell should have plenty of CPU time left to run.

We found where the DQE was busy waiting, and fixed it. We can now run
DQE at a higher priority than the shell without losing control. Then we
run the UA at normal priority, and we get improved packet trip times vs.
when we were running the DQE at normal priority.

However, I don’t understand why this works, because of another problem:
The UA communicates with the DQE by using MsgSend() to send command
messages. The main thread of the DQE has a “while (1)” loop which
contains a call to MsgReceive() to receive the messages. The problem is,
as soon as the DQE receives the first message from the UA, its priority
drops to match that of the UA. This is revealed by calling ps from the
command line both before and after sending the first message from the UA
to the DQE.

Questions:

Why does the priority of the DQE drop when it receives a message? Can
this be prevented?
Why does the performance improve when we run DQE at a higher priority
and then let it drop, vs. running it at normal priority in the first
place? (I suspect the answer is that only the main thread drops
priority, and the other two don’t. How do I verify this?)

Thanks in advance for any help. And a big thank you to the wonderful
online QNX community for all of the help you’ve given me so far!

Igor_Kovalenko2 · December 19, 2002, 6:34am

“Karl von Laudermann” <karl@nospam.ueidaq.com> wrote in message
news:atqoko$ao7$1@inn.qnx.com…

However, I don’t understand why this works, because of another problem:
The UA communicates with the DQE by using MsgSend() to send command
messages. The main thread of the DQE has a “while (1)” loop which
contains a call to MsgReceive() to receive the messages. The problem is,
as soon as the DQE receives the first message from the UA, its priority
drops to match that of the UA. This is revealed by calling ps from the
command line both before and after sending the first message from the UA
to the DQE.

Questions:

Why does the priority of the DQE drop when it receives a message? Can
this be prevented?

It happens due to the priority inheritance on message passing (which is
supposed to help you to deal with the issue of priority inversion). You can
disable it by setting _NTO_CHF_FIXED_PRIORITY flag on the channel (see
ChannelCreate()).

Why does the performance improve when we run DQE at a higher priority
and then let it drop, vs. running it at normal priority in the first
place? (I suspect the answer is that only the main thread drops
priority, and the other two don’t. How do I verify this?)

By using pidin instead of ps (it will show you all threads by default). Yes,
only the receiver thread will inherit the priority of senders.

– igor

Armin2 · December 19, 2002, 2:48pm

Karl von Laudermann wrote:

Ok, we’ve made some progress in our test programs since I first posted.
I took David Gibbs’ suggestion of using ClockCycles() instead of
ClockTime() to measure times, and this resulted in a performance
improvement.

Since several people mentioned priorities, we turned our attention to
this issue next. We figured that the reason we were seeing longer packet
travel times when telling the UA to tell the DQE to send only one packet
vs. sending 10,000 packets was that the scheduler was taking time away
from the DQE to give it to the UA, and that running the DQE at a higher
priority than the UA would solve this. I first tested this hypothesis by
merely adding a nap(1) call to the UA right after it issues the “send
one packet” command to the DQE, and this worked. The packet trip time
measured shorter.

Mario Charest wrote:
I tried to do some quick and dirty experiments with raising the priority
of the DQE, but as soon as I launch the DQE at a higher priority than
the command line shell, I have no control of my machine and have to
reboot >

That indicated you program is busy waiting and that’s bad. When you
send
messages over ethernet a program will actally spend quite a bit of time
doing nothing. Thus shell should have plenty of CPU time left to run.

We found where the DQE was busy waiting, and fixed it. We can now run
DQE at a higher priority than the shell without losing control. Then we
run the UA at normal priority, and we get improved packet trip times vs.
when we were running the DQE at normal priority.

However, I don’t understand why this works, because of another problem:
The UA communicates with the DQE by using MsgSend() to send command
messages. The main thread of the DQE has a “while (1)” loop which
contains a call to MsgReceive() to receive the messages. The problem is,
as soon as the DQE receives the first message from the UA, its priority
drops to match that of the UA. This is revealed by calling ps from the
command line both before and after sending the first message from the UA
to the DQE.

Questions:

Why does the priority of the DQE drop when it receives a message? Can
this be prevented?

Assign your client at least a prio which is as high as the prio of DQE.

Why does the performance improve when we run DQE at a higher priority

The network interface inherits the prio of DQE …

and then let it drop, vs. running it at normal priority in the first
place? (I suspect the answer is that only the main thread drops
priority, and the other two don’t. How do I verify this?)

As Igor told already … PIDIN or spin

Armin

Thanks in advance for any help. And a big thank you to the wonderful
online QNX community for all of the help you’ve given me so far!