Priorities, io-net, and QNET

I’m getting about 100ms of extra delay across the network on QNET when
there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?

This question was asked a few months ago by someone else
on OpenQNX at

http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html

but was not answered there. Thanks.

(QNX 6.21PE, x86).

John Nagle
Team Overbot

The PC architecture is not realtime-friendly (nor it should be, probably).
Even with a realtime OS, access to hardware can create non-deterministic
effects.

If you do something that causes video card to generate interrupts, they got
to be served. Try to make sure that network card uses higher-priority IRQ.
Also it is not unusual for video and NIC to share interrupt in modern PCs,
so make sure the NIC got one exclusively. Even then, typical x86 PCI arbiter
will still try to balance bus requests from all PCI devices ‘fairly’.

Watch out for the priorities too. Lot of drivers/io-managers have the nasty
habit of inheriting priority of the incoming pulses (such as those coming
from an ISR). In that case it is impossible to set their priority - it will
be reset on the next pulse.

– igor

“John Nagle” <nagle@overbot.com> wrote in message
news:419C1422.4090901@overbot.com

I’m getting about 100ms of extra delay across the network on QNET when
there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?

This question was asked a few months ago by someone else
on OpenQNX at

http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html

but was not answered there. Thanks.

(QNX 6.21PE, x86).

John Nagle
Team Overbot

Igor,

your staments are all OK … but John is talking about 100ms.
I can’t imaging that arbitration at hardware or ISR level could create
such a big delay. IMHO … there must be a general problem in the
networking system.

Armin



Igor Kovalenko wrote:

The PC architecture is not realtime-friendly (nor it should be, probably).
Even with a realtime OS, access to hardware can create non-deterministic
effects.

If you do something that causes video card to generate interrupts, they got
to be served. Try to make sure that network card uses higher-priority IRQ.
Also it is not unusual for video and NIC to share interrupt in modern PCs,
so make sure the NIC got one exclusively. Even then, typical x86 PCI arbiter
will still try to balance bus requests from all PCI devices ‘fairly’.

Watch out for the priorities too. Lot of drivers/io-managers have the nasty
habit of inheriting priority of the incoming pulses (such as those coming
from an ISR). In that case it is impossible to set their priority - it will
be reset on the next pulse.

– igor

“John Nagle” <> nagle@overbot.com> > wrote in message
news:> 419C1422.4090901@overbot.com> …

I’m getting about 100ms of extra delay across the network on QNET when
there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?

This question was asked a few months ago by someone else
on OpenQNX at

http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html

but was not answered there. Thanks.

(QNX 6.21PE, x86).

John Nagle
Team Overbot

io-net itself never initiates CPU load. You’re interested
in the priorities of the various drivers / protocols. Most
drivers have a ‘priority’ option for servicing interrupts.
The tcpip stack has (not sure if there’s something similar for
QNST):

timer_pulse_prio - Priority to run pure timeout operations at. Default is
20.

rx_pulse_prio - Priority to run pure input packet processing at. Default
is 21.

Servicing of user IO* requests in the tcpip stack is done at the
priority of the requester. So you can do something like:

io-net -d speedo priority=X -ptcpip rx_pulse_prio=Y,timer_pulse_prio=Z


You might also want to post which graphics driver you’re using.

-seanb


John Nagle <nagle@overbot.com> wrote:

I’m getting about 100ms of extra delay across the network on QNET when
there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?

This question was asked a few months ago by someone else
on OpenQNX at

http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html

but was not answered there. Thanks.

(QNX 6.21PE, x86).

John Nagle
Team Overbot

That’s a bit more helpful.

We’re seeing these 100+ ms delays with QNET, but NOT with
UDP. If the sending process has a reasonably high priority
(15 or so), it can reliably send UDP with < 10ms latency even
when there are compute-bound tasks running at priority 10-12.

But that doesn’t seem to work with QNET. We’re getting
100ms+ stalls when the screen is scrolling, even though
all the processes involved are at priority 16, and the
printing is done from a priority 9 thread.

This really matters for us, because we’re doing
real-time control within our robot vehicle using
a cluster of three networked machines.

As for the graphics driver, we’re just running the stock
x86 6.21 VESA driver. The one that goes compute-bound when
scrolling.

John Nagle
Team Overbot

Sean Boudreau wrote:

io-net itself never initiates CPU load. You’re interested
in the priorities of the various drivers / protocols. Most
drivers have a ‘priority’ option for servicing interrupts.
The tcpip stack has (not sure if there’s something similar for
QNST):

timer_pulse_prio - Priority to run pure timeout operations at. Default is
20.

rx_pulse_prio - Priority to run pure input packet processing at. Default
is 21.

Servicing of user IO* requests in the tcpip stack is done at the
priority of the requester. So you can do something like:

io-net -d speedo priority=X -ptcpip rx_pulse_prio=Y,timer_pulse_prio=Z


You might also want to post which graphics driver you’re using.

-seanb


John Nagle <> nagle@overbot.com> > wrote:

I’m getting about 100ms of extra delay across the network on QNET when
there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?


This question was asked a few months ago by someone else
on OpenQNX at


http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html


but was not answered there. Thanks.


(QNX 6.21PE, x86).


John Nagle
Team Overbot

When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

-xtang

John Nagle <nagle@downside.com> wrote in message
news:cnisj6$dqc$1@inn.qnx.com

That’s a bit more helpful.

We’re seeing these 100+ ms delays with QNET, but NOT with
UDP. If the sending process has a reasonably high priority
(15 or so), it can reliably send UDP with < 10ms latency even
when there are compute-bound tasks running at priority 10-12.

But that doesn’t seem to work with QNET. We’re getting
100ms+ stalls when the screen is scrolling, even though
all the processes involved are at priority 16, and the
printing is done from a priority 9 thread.

This really matters for us, because we’re doing
real-time control within our robot vehicle using
a cluster of three networked machines.

As for the graphics driver, we’re just running the stock
x86 6.21 VESA driver. The one that goes compute-bound when
scrolling.

John Nagle
Team Overbot

Sean Boudreau wrote:

io-net itself never initiates CPU load. You’re interested
in the priorities of the various drivers / protocols. Most
drivers have a ‘priority’ option for servicing interrupts.
The tcpip stack has (not sure if there’s something similar for
QNST):

timer_pulse_prio - Priority to run pure timeout operations at. Default
is
20.

rx_pulse_prio - Priority to run pure input packet processing at.
Default
is 21.

Servicing of user IO* requests in the tcpip stack is done at the
priority of the requester. So you can do something like:

io-net -d speedo priority=X -ptcpip rx_pulse_prio=Y,timer_pulse_prio=Z


You might also want to post which graphics driver you’re using.

-seanb


John Nagle <> nagle@overbot.com> > wrote:

I’m getting about 100ms of extra delay across the network on QNET
when
there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?


This question was asked a few months ago by someone else
on OpenQNX at


http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html


but was not answered there. Thanks.


(QNX 6.21PE, x86).


John Nagle
Team Overbot

John Nagle wrote:

That’s a bit more helpful.

We’re seeing these 100+ ms delays with QNET, but NOT with
UDP. If the sending process has a reasonably high priority
(15 or so), it can reliably send UDP with < 10ms latency even
when there are compute-bound tasks running at priority 10-12.

But that doesn’t seem to work with QNET. We’re getting
100ms+ stalls when the screen is scrolling, even though
all the processes involved are at priority 16, and the
printing is done from a priority 9 thread.

This really matters for us, because we’re doing
real-time control within our robot vehicle using
a cluster of three networked machines.

Try to use PVM (the parallel virtual machine ) … it is UDP based.
http://www.csm.ornl.gov/pvm/pvm_home.html

It is available for 6.21 at http://www.sf.net/projects/openqnx.

Regards

Armin


As for the graphics driver, we’re just running the stock
x86 6.21 VESA driver. The one that goes compute-bound when
scrolling.

John Nagle
Team Overbot

Sean Boudreau wrote:

io-net itself never initiates CPU load. You’re interested
in the priorities of the various drivers / protocols. Most
drivers have a ‘priority’ option for servicing interrupts.
The tcpip stack has (not sure if there’s something similar for
QNST):

timer_pulse_prio - Priority to run pure timeout operations at.
Default is
20.

rx_pulse_prio - Priority to run pure input packet processing at.
Default
is 21.

Servicing of user IO* requests in the tcpip stack is done at the
priority of the requester. So you can do something like:

io-net -d speedo priority=X -ptcpip rx_pulse_prio=Y,timer_pulse_prio=Z


You might also want to post which graphics driver you’re using.

-seanb


John Nagle <> nagle@overbot.com> > wrote:

I’m getting about 100ms of extra delay across the network on QNET when
there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?



This question was asked a few months ago by someone else
on OpenQNX at



http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html



but was not answered there. Thanks.



(QNX 6.21PE, x86).



John Nagle
Team Overbot

Armin Steinhoff wrote:

John Nagle wrote:

That’s a bit more helpful.

We’re seeing these 100+ ms delays with QNET, but NOT with
UDP. If the sending process has a reasonably high priority
(15 or so), it can reliably send UDP with < 10ms latency even
when there are compute-bound tasks running at priority 10-12.

But that doesn’t seem to work with QNET. We’re getting
100ms+ stalls when the screen is scrolling, even though
all the processes involved are at priority 16, and the
printing is done from a priority 9 thread.

This really matters for us, because we’re doing
real-time control within our robot vehicle using
a cluster of three networked machines.


Try to use PVM (the parallel virtual machine ) … it is UDP based.
→ > http://www.csm.ornl.gov/pvm/pvm_home.html

It is available for 6.21 at > http://www.sf.net/projects/openqnx> .

Regards

Armin

That’s a good migration path to Linux, but I was hoping to
get QNX messaging to work right.

John Nagle
Team Overbot

John Nagle wrote:

That’s a good migration path to Linux, but I was hoping to
get QNX messaging to work right.

Yes. QNET needs rx_pulse_prio and whatever other pulses are created to
be controllable, otherwise it is not usable in a real-time system.

Rennie

John Nagle wrote:

Armin Steinhoff wrote:

John Nagle wrote:

That’s a bit more helpful.

We’re seeing these 100+ ms delays with QNET, but NOT with
UDP. If the sending process has a reasonably high priority
(15 or so), it can reliably send UDP with < 10ms latency even
when there are compute-bound tasks running at priority 10-12.

But that doesn’t seem to work with QNET. We’re getting
100ms+ stalls when the screen is scrolling, even though
all the processes involved are at priority 16, and the
printing is done from a priority 9 thread.

This really matters for us, because we’re doing
real-time control within our robot vehicle using
a cluster of three networked machines.



Try to use PVM (the parallel virtual machine ) … it is UDP based.
→ > http://www.csm.ornl.gov/pvm/pvm_home.html

It is available for 6.21 at > http://www.sf.net/projects/openqnx> .

Regards

Armin


That’s a good migration path to Linux,

I don’t believe that the combination of PVM and Linux is an alternative
to PVM + QNX. The event driven scheduling of QNX is the key for
performance …

but I was hoping to get QNX messaging to work right.

I hope this is possible … any comments from QSSL ??

Regards

Armin


John Nagle
Team Overbot

Xiaodan Tang wrote:

When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

-xtang

That works. Thanks. With that mount at priority 16,
GUI activity no longer stalls QNET. I’m not missing a
single time-critical response now, no matter what I do at user
level.

Another item to put in the system administration manual.

John Nagle
Team Overbot

John Nagle <nagle@downside.com> wrote:

Xiaodan Tang wrote:
When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

-xtang

That works. Thanks. With that mount at priority 16,
GUI activity no longer stalls QNET. I’m not missing a
single time-critical response now, no matter what I do at user
level.

Another item to put in the system administration manual.

I’ll add it to the queue. Thanks for the suggestion.


Steve Reid stever@qnx.com
TechPubs (Technical Publications)
QNX Software Systems

Yes. It’s worth checking and documenting that TCP/IP
apparently runs with the priority of the process that
opened the connection, but QNET runs at the priority of
“npm-qnet.so”. At least in QNX 6.21. It may be different
in the new “QNET lite” in 6.3.

In any case, all of that needs to be documented. It’s
hard for us users who don’t have access to the source code
to figure these things out.

John Nagle
Team Overbot

Steve Reid wrote:

John Nagle <> nagle@downside.com> > wrote:

Xiaodan Tang wrote:

When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

-xtang


That works. Thanks. With that mount at priority 16,
GUI activity no longer stalls QNET. I’m not missing a
single time-critical response now, no matter what I do at user
level.


Another item to put in the system administration manual.


I’ll add it to the queue. Thanks for the suggestion.


Steve Reid > stever@qnx.com
TechPubs (Technical Publications)
QNX Software Systems

Xiaodan Tang wrote:

When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

interesting solution after a looong thread :slight_smile: Thanks a lot!

Regards

Armin




-xtang

John Nagle <> nagle@downside.com> > wrote in message
news:cnisj6$dqc$> 1@inn.qnx.com> …

That’s a bit more helpful.

We’re seeing these 100+ ms delays with QNET, but NOT with
UDP. If the sending process has a reasonably high priority
(15 or so), it can reliably send UDP with < 10ms latency even
when there are compute-bound tasks running at priority 10-12.

But that doesn’t seem to work with QNET. We’re getting
100ms+ stalls when the screen is scrolling, even though
all the processes involved are at priority 16, and the
printing is done from a priority 9 thread.

This really matters for us, because we’re doing
real-time control within our robot vehicle using
a cluster of three networked machines.

As for the graphics driver, we’re just running the stock
x86 6.21 VESA driver. The one that goes compute-bound when
scrolling.

John Nagle
Team Overbot

Sean Boudreau wrote:


io-net itself never initiates CPU load. You’re interested
in the priorities of the various drivers / protocols. Most
drivers have a ‘priority’ option for servicing interrupts.
The tcpip stack has (not sure if there’s something similar for
QNST):

timer_pulse_prio - Priority to run pure timeout operations at. Default

is

rx_pulse_prio - Priority to run pure input packet processing at.

Default

is 21.

Servicing of user IO* requests in the tcpip stack is done at the
priority of the requester. So you can do something like:

io-net -d speedo priority=X -ptcpip rx_pulse_prio=Y,timer_pulse_prio=Z


You might also want to post which graphics driver you’re using.

-seanb


John Nagle <> nagle@overbot.com> > wrote:


I’m getting about 100ms of extra delay across the network on QNET

when

there’s a window scrolling on the local machine, which of course makes
that machine compute-bound. (And no, I don’t want to hear about the
“tweaked VESA driver”) This occurs even when all the user
processes involved in the communication are at priority 16r.
What’s the bottleneck? Do we need to run io-net at a higher
priority. or give some option to it somewhere?


This question was asked a few months ago by someone else
on OpenQNX at


http://www.openqnx.com/PNphpBB2-viewtopic-t2265-.html


but was not answered there. Thanks.


(QNX 6.21PE, x86).


John Nagle
Team Overbot

John Nagle <nagle@downside.com> wrote:
JN > Xiaodan Tang wrote:

When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

-xtang

JN > That works. Thanks. With that mount at priority 16,
JN > GUI activity no longer stalls QNET. I’m not missing a
JN > single time-critical response now, no matter what I do at user
JN > level.

Please explain why the priority of the mount command has an effect on the
priority that io-net runs QNET at?

Bill Caroselli wrote:

John Nagle <> nagle@downside.com> > wrote:
JN > Xiaodan Tang wrote:

When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

-xtang


JN > That works. Thanks. With that mount at priority 16,
JN > GUI activity no longer stalls QNET. I’m not missing a
JN > single time-critical response now, no matter what I do at user
JN > level.

Please explain why the priority of the mount command has an effect on the
priority that io-net runs QNET at?

Presumably the mount command is starting the threads that
actually do the work, and so they inherit the priority of the
mount command.

Someone with access to the source code should answer this one.

It’s not clear whether it should happen this way or whether it’s
a bug. But it’s real.

John Nagle
Team Overbot

Presumably the mount command is starting the threads that
actually do the work, and so they inherit the priority of the
mount command.

It’s simple, resource managers inherit the priorty of their clients.
The mount command is sending an IO_MOUNT to io-net, and the thread
handling it will float up, and in turn qnet’s init routine will run at
that prioity, and in turn, the threads it creates will inherit that
priority. Basically a work-around to a missing (and critical) feature
that happens to work due to resmgr side effects. :slight_smile:

chris

Chris McKillop <cdm@killerstuff.net> wrote:

Presumably the mount command is starting the threads that
actually do the work, and so they inherit the priority of the
mount command.

CM > It’s simple, resource managers inherit the priorty of their clients.
CM > The mount command is sending an IO_MOUNT to io-net, and the thread
CM > handling it will float up, and in turn qnet’s init routine will run at
CM > that prioity, and in turn,
the threads it creates will inherit that
(Ah! Here’s the magic.)
CM > priority. Basically a work-around to a missing (and critical) feature
CM > that happens to work due to resmgr side effects. :slight_smile:

There are some places inside the QNET, a thread wake up for an
internal event, we need to “pick a priority” for these thread.

For example, the timer pulse. When timer goes off, a pulse fired, thread
will adjust to the pulse priority, but which priority it should be?

Most of other code, (stack, io-net) are fixed for allow user to specify
the priority of this kind of “internal event priority”, from command line
option.

The 6.2.1 QNET happened to choose “base priority”, which basiclly
is the priority it get initialized, as this internal priority. By raising
the mount
priority, the io-net thread who handle the mount request inherit that
priority, and load/initialize QNET with that priority.

I won’t call this behavior a bug, but agree a command line option
(to adjust this priority) would be more obverious to users.

-xtang


John Nagle <nagle@downside.com> wrote in message
news:coic2s$hj5$1@inn.qnx.com

Bill Caroselli wrote:

John Nagle <> nagle@downside.com> > wrote:
JN > Xiaodan Tang wrote:

When you “mount -Tio-net npm-qnet.so”, try raise the priorities line
“on -p 16 mount -Tio-net npm-qnet.so”, and see if that helps you?

-xtang


JN > That works. Thanks. With that mount at priority 16,
JN > GUI activity no longer stalls QNET. I’m not missing a
JN > single time-critical response now, no matter what I do at user
JN > level.

Please explain why the priority of the mount command has an effect on
the
priority that io-net runs QNET at?

Presumably the mount command is starting the threads that
actually do the work, and so they inherit the priority of the
mount command.

Someone with access to the source code should answer this one.

It’s not clear whether it should happen this way or whether it’s
a bug. But it’s real.

John Nagle
Team Overbot

John Nagle <nagle@downside.com> wrote:

Agreed.

It definitely needs to be documented. We spent about a week
chasing down the performance problem. The default priority of
QNET is so low that screen scrolling can block it.

The UDP/TCP/IP stack behavior in this area should be documented
too.

The stack options are documented in the upcoming patch.

-seanb