“John Nagle” <nagle@downside.com> wrote in message
news:e8ma7s$7q1$1@inn.qnx.com…
A big advantage of synchronous message passing is that MsgReply
is non-blocking. So when your low-priority non-realtime process makes
some request of a higher-priority process, the higher priority process
can’t get stuck at MsgReply waiting for the lower priority caller
to get some CPU time.
That is not an “advantage” of sync message passing. Async would not block
either. If you think about it, the sync message passing is nothing more than
a special case of async one, where the queue is limited to 1 item of size
min(send_buffer, recv_buffer) and is provided by either sender or receiver.
Which means either sender or receiver has to block.
The trick is integrating this with the scheduler so you can avoid priority
inversion and take advantage of the fact that you block. Your blocking on
send/recv provides implicit synchronisation. That is why QNX message passing
is faster than shared memory queue with flow control via semaphores (you
have 3 kernel calls per transaction with SRR vs 4 with semaphores). Of
course the gain is at the expense of flexibility - there’s still no free
lunch, QNX or not.
Other systems have used this idea too. It is incorrect to say ‘they got it
wrong’. They had different design goals. Mach for example was an academic
excercize and very advanced at that. Their design goals included ability to
run unmodified BSD binaries, VM external to the kernel, sharing memory
across network, etc. QNX on the other hand had limited goals and targeted
mostly embedded systems where all that sophistication is not needed, nor
legacy 3rd party apps are much of a concern. So they are fast, but it’s not
the message passing that they got right. It is the balance of complexity and
features. So I will point that QNX can’t run its own binaries from older
releases, let alone BSD binaries. No free lunch.
What QNX realized is that it’s not copying that kills you (as many naive
opponents of message passing tend to assume). It is the context switches. So
they made it (the kernel) simple, which made it much easier to make context
switches cheap. I am sure Mach people knew that too, but they could not make
it that simple given their design goals. So they tried to optimize message
passing using copy-on-write, but that did not help all that much since
copying does not hurt you that much in the first place.
What QNX failed to see (or chose to ignore) is that in a system where
high-bandwidth data has to travel through multiple memory-isolated
subsystems, let alone where some of them use async abstractions built on top
of sync ones (which really should be the other way around) performance will
be miserable. Disk I/O and TCP/IP performance … cough, cough …
I really came to appreciate all this when doing the Overbot
software. We had a lot of stuff going on in one CPU: mid
level servoloop control, LIDAR data processing, video processing,
map building, and planning. QNX could meet the real time constraints
consistently. And we were checking; if updates didn’t get done in
time, emergency hardware timers tripped and the brakes slammed on.
Even at 80% CPU utilization, it all worked.
Yes, the ability of a system to avoid priority inversion is one of the keys
to that. If you have ever seen tools that do RMS analysis/simulation, it is
amazing to see how picture changes when you click on ‘use priority
inheritance’ button. All of a sudden you need much less powerful CPU to meet
your deadlines…
Linux is getting those capabilities, albeit slow because Linus does not feel
like encumbering the kernel with stuff that will benefit 10% of users at the
expence of other 90%. But Montavista got enough money and momentum to keep
pushing so far (of course we’re providing a good chunk of that money, lol).
– igor