QNX priorities and timing accuracy

Hello.

I have got a system of many different threads at different priorities. For
some time now, I have tried to fix the priorities in order to get a definite
execution order and definite timings for my application threads.

When I do a kind of benchmark and examine the cycle times realised for my
application parts at higher priority (30 fifo), it becomes obvious that
these cycle times are not constant. During the benchmark, one thread sleeps
for e.g. 10 ms and sends a pulse to the following thread in order to wake
him up. As the second thread is finished, he forwards the pulse to the third
thread and so on, until the cycle starts with the first thread again (it
sleeps 10 ms again).

The times, when each new thread gets started is measured with the
ClockCycles() function. They relatively constant due to the time a pulse
consumes for being communicated; in effect about 4 microseconds. But
sometimes, the execution times for each or some thread within the cycle
raises far beyond the expected value to sometimes hundreds of microseconds.

pidin does not show any threads with higher priority to explain this kind of
behavior.

Does anybody know what the reason for the spontaneous raise of execution
time could be? What can I do in order to ensure a CONSTANT cycle timing over
the entire application time?

Regards.

Nnamdi

Nnamdi Kohn wrote:

Hello.

I have got a system of many different threads at different priorities. For
some time now, I have tried to fix the priorities in order to get a definite
execution order and definite timings for my application threads.

When I do a kind of benchmark and examine the cycle times realised for my
application parts at higher priority (30 fifo), it becomes obvious that
these cycle times are not constant. During the benchmark, one thread sleeps
for e.g. 10 ms

No … you can’t rely on that duration. It is always possible that your
process will wait 11ms (ticksize 1ms).

Have a look to: http://www.qnx.com/developers/articles/article_834_1.html

and sends a pulse to the following thread in order to wake
him up.

A pulse event has also a priority … check if the priority is 30.
If not … the receiving process will start probably at a lower priority
It’s also possible to switch off ‘sliding priority’ for the receiving
process.

As the second thread is finished, he forwards the pulse to the third
thread and so on, until the cycle starts with the first thread again (it
sleeps 10 ms again).

The times, when each new thread gets started is measured with the
ClockCycles() function. They relatively constant due to the time a pulse
consumes for being communicated; in effect about 4 microseconds. But
sometimes, the execution times for each or some thread within the cycle
raises far beyond the expected value to sometimes hundreds of microseconds.

pidin does not show any threads with higher priority to explain this kind of
behavior.

If your threads are scheduled by RR and one of your threads will consume
more than 5ms (time slice) → it will be suspended until no other
threads at the same priority level is runnable. Costs simply time even
if other threads at the same prio level aren’t requesting the CPU.

Does anybody know what the reason for the spontaneous raise of execution
time could be? What can I do in order to ensure a CONSTANT cycle timing over
the entire application time?

You could use the Real-Time-Clock (RTC, IRQ8) … sources has been
posted here recently.

Regards

Armin Steinhoff

http://www.steinhoff-automation.com




Regards.

Nnamdi

Hello Armin, thanks for your help so far.

I understood the bit about the ticksize. But for our application (in real
life) we don’t use sleep() or something that uses the timer property of QNX.
Our “tick” is generated by hardware (our communication hardware produces a
125µs tick). So this should ensure a DEFINITE cycle time of 125µs.

And we definitely use FIFO scheduling for our threads at a priority of 30.

The usage of the real-time clock would make the time-coupling with our
communication system (which has its own timing) rather difficult. The cycle
in which the processes (threads) have to complete their operation must
correspond to the communication cycle. So we can not use the QNX real-time
clock.

Are there maybe other effects that limit the prediction of execution times
and context switching times?

Nnamdi

Nnamdi Kohn wrote:

Hello Armin, thanks for your help so far.

I understood the bit about the ticksize. But for our application (in real
life) we don’t use sleep() or something that uses the timer property of QNX.
Our “tick” is generated by hardware (our communication hardware produces a
125µs tick). So this should ensure a DEFINITE cycle time of 125µs.

At least for the your timer … how do you suspend your threads for 10ms??

And we definitely use FIFO scheduling for our threads at a priority of 30.

The usage of the real-time clock would make the time-coupling with our
communication system (which has its own timing) rather difficult. The cycle
in which the processes (threads) have to complete their operation must
correspond to the communication cycle. So we can not use the QNX real-time
clock.

OK … the use of the RTC was only proposed as an alternative to the
standard time services.

Are there maybe other effects that limit the prediction of execution times
and context switching times?

What about the priorities of the pulse events ??

// initialize notification event
SIGEV_PULSE_INIT(&puls_event,coid, getprio(0),_PULSE_CODE_MINAVAIL,0);
^^^^?^^^^^
How do you handle the timer interrupt of your hardware clock?
How big is the overhead created by the interrupt handler??

Please have in mind that 125us is a short time …

Armin

Nnamdi

Nnamdi Kohn wrote:

Are there maybe other effects that limit the prediction of execution times
and context switching times?

Yes, IRQ sharing.

Read this thread: http://www.openqnx.com/PNphpBB2-viewtopic-t1798-.html

Evan Hillas wrote:

Nnamdi Kohn wrote:

Are there maybe other effects that limit the prediction of execution
times
and context switching times?


Yes, IRQ sharing.

Read this thread: > http://www.openqnx.com/PNphpBB2-viewtopic-t1798-.html

To save you getting bogged down in that conversation, what can happen is
when the other device fires it’s IRQ and subsequent handler thread it
leaves your IRQ masked until it is happy it has nothing more to service
then finally gets back to letting you have your turn.

This behaviour appears to be standard practice for QSS written drivers.
Not that IAE() can provide a reliable way sharing anyway.

Evan Hillas wrote:

Evan Hillas wrote:

Nnamdi Kohn wrote:

Are there maybe other effects that limit the prediction of execution
times
and context switching times?


Yes, IRQ sharing.

Read this thread: > http://www.openqnx.com/PNphpBB2-viewtopic-t1798-.html



To save you getting bogged down in that conversation, what can happen is
when the other device fires it’s IRQ and subsequent handler thread it
leaves your IRQ masked until it is happy it has nothing more to service
then finally gets back to letting you have your turn.

This behaviour appears to be standard practice for QSS written drivers.

As often seen … the QSS drivers are doing a lot in the ISR. That’s
adds sometimes huge latencies, which isn’t acceptable for the real-time
behavior of an RTOS.

When we talk about network drivers, then have in mind that a lot of DMA
transfers can steal a huge amount of PCI bus and CPU cycles.

Regards

Armin

\

Not that IAE() can provide a reliable way sharing anyway.

Armin Steinhoff wrote:

Evan Hillas wrote:
To save you getting bogged down in that conversation, what can happen
is when the other device fires it’s IRQ and subsequent handler thread
it leaves your IRQ masked until it is happy it has nothing more to
service then finally gets back to letting you have your turn.

This behaviour appears to be standard practice for QSS written drivers.


As often seen … the QSS drivers are doing a lot in the ISR. That’s
adds sometimes huge latencies, which isn’t acceptable for the real-time
behavior of an RTOS.

When we talk about network drivers, then have in mind that a lot of DMA
transfers can steal a huge amount of PCI bus and CPU cycles.

None of that has to prevent other drivers from getting access to their
hardware. Long processing paths clearly don’t need short latency,
particularly when there is DMA/buffering involved, so this driver should
free up the IRQ for others to share while it is still churning through
it’s pending data.

It’s the same principle as “Don’t use InterruptDisable() for long
periods of time”.

Armin Steinhoff wrote:

As often seen … the QSS drivers are doing a lot in the ISR. That’s
adds sometimes huge latencies, which isn’t acceptable for the real-time
behavior of an RTOS.

Opps, sorry, I wasn’t reading your input too well. :wink:

Agreed, it’s entirely not acceptable imho, which is why I keep hammering the topic.

Evan Hillas wrote:

Armin Steinhoff wrote:

Evan Hillas wrote:

To save you getting bogged down in that conversation, what can happen
is when the other device fires it’s IRQ and subsequent handler thread
it leaves your IRQ masked until it is happy it has nothing more to
service then finally gets back to letting you have your turn.

This behaviour appears to be standard practice for QSS written drivers.



As often seen … the QSS drivers are doing a lot in the ISR. That’s
adds sometimes huge latencies, which isn’t acceptable for the
real-time behavior of an RTOS.

When we talk about network drivers, then have in mind that a lot of
DMA transfers can steal a huge amount of PCI bus and CPU cycles.


None of that has to prevent other drivers from getting access to their
hardware.

I didn’t say that other drivers don’t get access to their hardware, but
to get access to the hardware you needs at first the CPU and then the
PCI bus … if both are only limited available the execution speed of
your process will slow down.

This happens because most PCI network boards are working as DMA and bus
master devices.

Long processing paths clearly don’t need short latency,
particularly when there is DMA/buffering involved, so this driver should
free up the IRQ for others to share while it is still churning through
it’s pending data.

It’s the same principle as “Don’t use InterruptDisable() for long
periods of time”.

OK … well known.

Armin

Armin Steinhoff wrote:

Evan Hillas wrote:

Armin Steinhoff wrote:
When we talk about network drivers, then have in mind that a lot of
DMA transfers can steal a huge amount of PCI bus and CPU cycles.


None of that has to prevent other drivers from getting access to
their hardware.


I didn’t say that other drivers don’t get access to their hardware, but
to get access to the hardware you needs at first the CPU and then the
PCI bus … if both are only limited available the execution speed of
your process will slow down.

True. So we need more info from Mr Kohn …

sorry for the delay, guys, I’ve been far away in China… please let us get
this subject finished, if you are not already fed up with it. Thanks for
your input so far.


Armin Steinhoff wrote (29.11.2004):

What about the priorities of the pulse events ??

the “pulses” are currently not sent as EVENTs but as MESSAGEs via
sgSend( coid, NULL, 0, NULL, 0), with no data contents for read and write.
So, the priorities should simply depend on the current thread priority which
is 30 by definition. As far as I know, the waiting thread (operating as
server) does inherit the priority of the message (pulse) sender as it
receives the pulse.

How do you handle the timer interrupt of your hardware clock?
How big is the overhead created by the interrupt handler??

these two questions I’d like to consider later, when there is more clarity
about the pulse transmission time accuracy. Currently, I don’t handle any
timer interrupt myself. The interrupt generated from the communication
system (IEEE1394) and led into the computer is currently handled by a
software layer from an external company. But this interrupt is not the
problem at this particular time. The interrupt from the communication system
is just the start signal for a new cycle on the computer. From this starting
point, the pulses for “switching” the different threads are transferred (in
the same cycle). The problem is caused because these pulses seem to have
their own brain, causing the single pulse transmission times to vary in a
huge timely range.

Thanks.

Nnamdi

Nnamdi Kohn wrote:

sorry for the delay, guys, I’ve been far away in China… please let
us get
this subject finished, if you are not already fed up with it. Thanks for
your input so far.


Armin Steinhoff wrote (29.11.2004):


What about the priorities of the pulse events ??



the “pulses” are currently not sent as EVENTs but as MESSAGEs via
sgSend( coid, NULL, 0, NULL, 0), with no data contents for read and
write.

That’s a BIG difference … because sending a pulse is a non-blocking
operation. Sending a message leads to a reply blocked state.

So, the priorities should simply depend on the current thread
priority which
is 30 by definition. As far as I know, the waiting thread (operating as
server) does inherit the priority of the message (pulse) sender as it
receives the pulse.


How do you handle the timer interrupt of your hardware clock?
How big is the overhead created by the interrupt handler??



these two questions I’d like to consider later, when there is more
clarity
about the pulse transmission time accuracy. Currently, I don’t handle any
timer interrupt myself. The interrupt generated from the communication
system (IEEE1394) and led into the computer is currently handled by a
software layer from an external company. But this interrupt is not the
problem at this particular time. The interrupt from the communication
system
is just the start signal for a new cycle on the computer. From this
starting
point, the pulses for “switching” the different threads are
transferred (in
the same cycle). The problem is caused because these pulses seem to have
their own brain, causing the single pulse transmission times to vary in a
huge timely range.

You shouldn’t talk about ‘pulses’ if you are using messages!

Armin

Thanks.

Nnamdi

the “pulses” are currently not sent as EVENTs but as MESSAGEs via
sgSend( coid, NULL, 0, NULL, 0), with no data contents for read and
write.

That’s a BIG difference … because sending a pulse is a non-blocking
operation. Sending a message leads to a reply blocked state.

sorry for the confusion. I think, I got it completely wrong.

We use “pulse messages” in our distributing process, as described in Rob
Krten’s first QNX book. These messages are sent (blocked) via MsgSend() to a
desired application process (all application processes are blocked on
MsgReceive() ). After finishing the work inside the application process, it
does a MsgReply() back to the distributing process. Then the distributing
process chooses the next process to receive the empty message.

Nnamdi

Nnamdi Kohn wrote:

the “pulses” are currently not sent as EVENTs but as MESSAGEs via
sgSend( coid, NULL, 0, NULL, 0), with no data contents for read and

write.

That’s a BIG difference … because sending a pulse is a non-blocking
operation. Sending a message leads to a reply blocked state.


sorry for the confusion. I think, I got it completely wrong.

We use “pulse messages” in our distributing process, as described in Rob
Krten’s first QNX book. These messages are sent (blocked) via MsgSend() to a
desired application process (all application processes are blocked on
MsgReceive() ).

So you have several application servers and a single client for the
distribution of requests?

It’s also possible to have one server and many clients which are ‘reply
blocked’ … that means the server distributes processing time by
replying to the blocked clients.

After finishing the work inside the application process, it
does a MsgReply() back to the distributing process. Then the distributing
process chooses the next process to receive the empty message.

That means all application processes (server) will run with the priority
of the distribution process after receiving a message from the
distribution process (client) . Is that what you want??

Armin


Nnamdi

We use “pulse messages” in our distributing process, as described in Rob
Krten’s first QNX book. These messages are sent (blocked) via MsgSend()
to a
desired application process (all application processes are blocked on
MsgReceive() ).

So you have several application servers and a single client for the
distribution of requests?

right.

It’s also possible to have one server and many clients which are ‘reply
blocked’ … that means the server distributes processing time by
replying to the blocked clients.

interesting… but, what advantages would this have compated to the
one-client-many-servers structure regarding the processing speed of the
distributing process? The number of messages to be transmitted would stay
the same; but I would see some differences:

  • the number of QNX channels are reduced to one
  • there would be no need for a availability-check on the
    distributing server side, because receiving a message from
    a client would automatically imply a correct connection

Does this make a significant performance difference at all?

After finishing the work inside the application process, it
does a MsgReply() back to the distributing process. Then the
distributing
process chooses the next process to receive the empty message.

That means all application processes (server) will run with the priority
of the distribution process after receiving a message from the
distribution process (client) . Is that what you want??

That is correct. I tried it with different priorities (one time the servers
were of lower, sometimes of higher priority), but this had a negative
performance effect, if any.

Nnamdi