QSSL should change its policy

Andrew_Thomas1 · July 14, 2005, 10:18am

John Nagle wrote:

Andrew Thomas wrote:

Armin Steinhoff wrote:

That’s the reason why middleware like PVM and MPI have been created.
These packages are doing message passing on top of the socket lib.

Or, if you have legacy QNX4 code, or just like the kernel-mediated
message passing, proxies, etc. of QNX, you could use the SRRIPC module
for Linux. It is basically QNX4 message passing, proxies, timers, and
to some degree user-space interrupts for Linux.

None of those do as good a job of message passing as QNX,
because the message passing and scheduling aren’t as well
coupled. SRRIPC takes two copies and an extra trip through
the scheduler to do what QNX does with one copy and one context switch.

SRRIPC was done for the 2.2x Linux kernel, and it’s not
clear how well it works with the 2.6 kernel. It never really
got out of beta, either. You definitely
want a 2.6 Linux kernel for anything even vaguely real-time. That’s
the one with the low-latency fixes.

Still, it’s important to have a migration path from
QNX available. We have to be realistic about the future of QNX
since the acquisition.
John Nagle

Hi John,

SRRIPC has been running in production systems for years on 2.4 kernels,
and has been stable on 2.6 kernels for quite some time. People have
successfully ported large, complex QNX4 systems to Linux using the
SRRIPC module with almost no code change.

It does indeed do two copies - once into kernel space, and once back out
again - with every message pass. At some point it becomes memory
bandwidth limited, but this point occurs between 1 and 2 GB per second
on a typical machine. There was a patch floating around for single-copy
messaging in the SRRIPC module, but it was against a quite old version,
so we never incorporated it.

I’m not sure you’re correct about the two trips through the scheduler.
What makes you say that? It looks to me like there are the same number
of scheduling passes as in QNX.

In any case, as you imply, Linux is not truly real-time. There is no
priority inheritance in the SRRIPC module and memory access in the
kernel can cause paging. That’s hardly the point. For a stock Linux
kernel, there is no faster way to do message passing, and no other way
to cleanly get the QNX4 message passing semantics. To give you an idea,
we tested a couple of TCP-based approximations of QNX4 messaging and
they were up to 5000 times slower than the SRRIPC module. They did not
do a good job with proxies (especially proxies on timers), did not
deliver task death notification, and did not offer any help in interrupt
handling.

The SRRIPC module routinely posts performance that shows that it runs at
about half the speed of QNX4 messaging on the same hardware. Blame the
two-copy message passing.

Cheers,
Andrew

John_Nagle1 · July 17, 2005, 4:50am

Andrew Thomas wrote:

John Nagle wrote:

Andrew Thomas wrote:

SRRIPC was done for the 2.2x Linux kernel, and it’s not
clear how well it works with the 2.6 kernel. It never really
got out of beta, either. You definitely
want a 2.6 Linux kernel for anything even vaguely real-time. That’s
the one with the low-latency fixes.

Still, it’s important to have a migration path from
QNX available. We have to be realistic about the future of QNX
since the acquisition.
John Nagle

Hi John,

SRRIPC has been running in production systems for years on 2.4 kernels,
and has been stable on 2.6 kernels for quite some time. People have
successfully ported large, complex QNX4 systems to Linux using the
SRRIPC module with almost no code change.

It does indeed do two copies - once into kernel space, and once back out
again - with every message pass. At some point it becomes memory
bandwidth limited, but this point occurs between 1 and 2 GB per second
on a typical machine. There was a patch floating around for single-copy
messaging in the SRRIPC module, but it was against a quite old version,
so we never incorporated it.

I’m not sure you’re correct about the two trips through the scheduler.
What makes you say that? It looks to me like there are the same number
of scheduling passes as in QNX.

In any case, as you imply, Linux is not truly real-time. There is no
priority inheritance in the SRRIPC module and memory access in the
kernel can cause paging. That’s hardly the point. For a stock Linux
kernel, there is no faster way to do message passing, and no other way
to cleanly get the QNX4 message passing semantics. To give you an idea,
we tested a couple of TCP-based approximations of QNX4 messaging and
they were up to 5000 times slower than the SRRIPC module. They did not
do a good job with proxies (especially proxies on timers), did not
deliver task death notification, and did not offer any help in interrupt
handling.

That’s helpful. It’s good to know there’s a migration path in place.

I’d appreciate it if you’d try this with SRRIPC:

– Run two processes, A and B, which intercommunicate at a high rate
vis SRRIPC. (Obviously you have a test like that.)
– Run A and B again, but with a third compute-bound process, C,
also running. Report the results.

The scheduling question is whether a message pass sends you to the end of
the line for CPU time (in which case A and B will slow down enormously).
Or, if it puts you at the head of the CPU queue, C will slow down enormously.
If A-B and C performance are roughly equal, scheduling and message passing
are working together properly.

If you can get that to work right, you can use message passing freely, as
a slightly more expensive subroutine call, even in compute-bound work.
That’s quite useful. If it can be made to work right, it would be
desireable to get it into the mainstream Linux kernel, and maybe
into OpenOffice, which tries to use a very slow CORBA ORB.

John Nagle
Team Overbot

Andrew_Thomas1 · August 31, 2005, 2:57pm

John Nagle wrote:

Andrew Thomas wrote:

John Nagle wrote:

Andrew Thomas wrote:

SRRIPC was done for the 2.2x Linux kernel, and it’s not
clear how well it works with the 2.6 kernel. It never really
got out of beta, either. You definitely
want a 2.6 Linux kernel for anything even vaguely real-time. That’s
the one with the low-latency fixes.

Still, it’s important to have a migration path from
QNX available. We have to be realistic about the future of QNX
since the acquisition.
John Nagle

Hi John,

SRRIPC has been running in production systems for years on 2.4
kernels, and has been stable on 2.6 kernels for quite some time.
People have successfully ported large, complex QNX4 systems to Linux
using the SRRIPC module with almost no code change.

It does indeed do two copies - once into kernel space, and once back
out again - with every message pass. At some point it becomes memory
bandwidth limited, but this point occurs between 1 and 2 GB per second
on a typical machine. There was a patch floating around for
single-copy messaging in the SRRIPC module, but it was against a quite
old version, so we never incorporated it.

I’m not sure you’re correct about the two trips through the scheduler.
What makes you say that? It looks to me like there are the same
number of scheduling passes as in QNX.

In any case, as you imply, Linux is not truly real-time. There is no
priority inheritance in the SRRIPC module and memory access in the
kernel can cause paging. That’s hardly the point. For a stock Linux
kernel, there is no faster way to do message passing, and no other way
to cleanly get the QNX4 message passing semantics. To give you an
idea, we tested a couple of TCP-based approximations of QNX4 messaging
and they were up to 5000 times slower than the SRRIPC module. They
did not do a good job with proxies (especially proxies on timers), did
not deliver task death notification, and did not offer any help in
interrupt handling.

That’s helpful. It’s good to know there’s a migration path in place.

I’d appreciate it if you’d try this with SRRIPC:

– Run two processes, A and B, which intercommunicate at a high rate
vis SRRIPC. (Obviously you have a test like that.)
– Run A and B again, but with a third compute-bound process, C,
also running. Report the results.

The scheduling question is whether a message pass sends you to the end of
the line for CPU time (in which case A and B will slow down enormously).
Or, if it puts you at the head of the CPU queue, C will slow down
enormously.
If A-B and C performance are roughly equal, scheduling and message passing
are working together properly.

If you can get that to work right, you can use message passing freely, as
a slightly more expensive subroutine call, even in compute-bound work.
That’s quite useful. If it can be made to work right, it would be
desireable to get it into the mainstream Linux kernel, and maybe
into OpenOffice, which tries to use a very slow CORBA ORB.

Belatedly, here is an answer to that question:

Two programs: tserver, which is part of the SRRIPC test suite
busy, which is a simple busy loop with timer

Run only: tserver -t 10 -T m
Output shows the message passing frequency averaged over a 10 second
interval.

srr: 4 bytes 1039710 times 10 sec 103971 Hz 831768 bytes/sec
srr: 1 bytes 1040133 times 10 sec 104013 Hz 208026 bytes/sec
srr: 0 bytes 1157715 times 10 sec 115771 Hz 0 bytes/sec

Run only: busy
Output show the length of time spent in 1 billion addition operations.
The first number is just a sequence number, second number is seconds.
1 took 4.008095
2 took 3.996774
3 took 4.103687

Run together:

busy:
1 took 10.945134
2 took 11.312189

tserver:
srr: 4 bytes 623551 times 10 sec 62355 Hz 498840 bytes/sec
srr: 1 bytes 626469 times 10 sec 62646 Hz 125292 bytes/sec
srr: 0 bytes 757522 times 10 sec 75752 Hz 0 bytes/sec

The busy loop took 2.75 times longer. That might make some sense when
you consider that the SRRIPC test involves two processes, so they should
each ideally get 1/3 of the CPU. CPU splits about evenly:

From “top”:
22288 andrew 25 0 1352 312 1328 R 33.9 0.1 0:05.24 busy
22290 andrew 25 0 1576 428 1416 R 33.2 0.1 0:03.87 tserver
22289 andrew 15 0 1580 500 1416 S 28.3 0.1 0:03.39 tserver

I hope that helps.

Andrew

John_Nagle1 · September 6, 2005, 5:17pm

Andrew Thomas wrote:

John Nagle wrote:

Andrew Thomas wrote:

John Nagle wrote:

Andrew Thomas wrote:

SRRIPC was done for the 2.2x Linux kernel, and it’s not
clear how well it works with the 2.6 kernel. It never really
got out of beta, either. You definitely
want a 2.6 Linux kernel for anything even vaguely real-time. That’s
the one with the low-latency fixes.

Still, it’s important to have a migration path from
QNX available. We have to be realistic about the future of QNX
since the acquisition.
John Nagle

Hi John,

SRRIPC has been running in production systems for years on 2.4
kernels, and has been stable on 2.6 kernels for quite some time.
People have successfully ported large, complex QNX4 systems to Linux
using the SRRIPC module with almost no code change.

It does indeed do two copies - once into kernel space, and once back
out again - with every message pass. At some point it becomes memory
bandwidth limited, but this point occurs between 1 and 2 GB per
second on a typical machine. There was a patch floating around for
single-copy messaging in the SRRIPC module, but it was against a
quite old version, so we never incorporated it.

I’m not sure you’re correct about the two trips through the
scheduler. What makes you say that? It looks to me like there are
the same number of scheduling passes as in QNX.

In any case, as you imply, Linux is not truly real-time. There is no
priority inheritance in the SRRIPC module and memory access in the
kernel can cause paging. That’s hardly the point. For a stock Linux
kernel, there is no faster way to do message passing, and no other
way to cleanly get the QNX4 message passing semantics. To give you
an idea, we tested a couple of TCP-based approximations of QNX4
messaging and they were up to 5000 times slower than the SRRIPC
module. They did not do a good job with proxies (especially proxies
on timers), did not deliver task death notification, and did not
offer any help in interrupt handling.

That’s helpful. It’s good to know there’s a migration path in place.

I’d appreciate it if you’d try this with SRRIPC:

– Run two processes, A and B, which intercommunicate at a high rate
vis SRRIPC. (Obviously you have a test like that.)
– Run A and B again, but with a third compute-bound process, C,
also running. Report the results.

The scheduling question is whether a message pass sends you to the end of
the line for CPU time (in which case A and B will slow down enormously).
Or, if it puts you at the head of the CPU queue, C will slow down
enormously.
If A-B and C performance are roughly equal, scheduling and message
passing
are working together properly.

If you can get that to work right, you can use message passing freely, as
a slightly more expensive subroutine call, even in compute-bound work.
That’s quite useful. If it can be made to work right, it would be
desireable to get it into the mainstream Linux kernel, and maybe
into OpenOffice, which tries to use a very slow CORBA ORB.

Belatedly, here is an answer to that question:

Two programs: tserver, which is part of the SRRIPC test suite
busy, which is a simple busy loop with timer

Run only: tserver -t 10 -T m
Output shows the message passing frequency averaged over a 10 second
interval.

srr: 4 bytes 1039710 times 10 sec 103971 Hz 831768 bytes/sec
srr: 1 bytes 1040133 times 10 sec 104013 Hz 208026 bytes/sec
srr: 0 bytes 1157715 times 10 sec 115771 Hz 0 bytes/sec

Run only: busy
Output show the length of time spent in 1 billion addition operations.
The first number is just a sequence number, second number is seconds.
1 took 4.008095
2 took 3.996774
3 took 4.103687

Run together:

busy:
1 took 10.945134
2 took 11.312189

tserver:
srr: 4 bytes 623551 times 10 sec 62355 Hz 498840 bytes/sec
srr: 1 bytes 626469 times 10 sec 62646 Hz 125292 bytes/sec
srr: 0 bytes 757522 times 10 sec 75752 Hz 0 bytes/sec

The busy loop took 2.75 times longer. That might make some sense when
you consider that the SRRIPC test involves two processes, so they should
each ideally get 1/3 of the CPU. CPU splits about evenly:

From “top”:
22288 andrew 25 0 1352 312 1328 R 33.9 0.1 0:05.24 busy
22290 andrew 25 0 1576 428 1416 R 33.2 0.1 0:03.87 tserver
22289 andrew 15 0 1580 500 1416 S 28.3 0.1 0:03.39 tserver

I hope that helps.

Andrew

That’s helpful in defining a migration path. Thanks.

John Nagle