High speed realtime in QNX

Isn’t that old news? Aug 21, 2000

Also, how does it handle CPUs less then PentIII?


Bill Caroselli – 1(626) 824-7983
Q-TPS Consulting
QTPS@EarthLink.net


“Mario Charest” <goto@nothingness.com> wrote in message
news:a2mt5p$1ff$1@inn.qnx.com

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3C4EEB8D.8030104@csical.com> …
Mario Charest wrote:



http://www.qnx.com/news/pr/aug21_00-pent.html

I don’t know if this refers to tricks, so much as access to some
undocumented features of the PIII.

Doesn’t matter, that tells me kernel is tuned for a perticular (most
probably
for each) architecture.
\

Mario Charest wrote:

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3C4EEB8D.8030104@csical.com> …

Mario Charest wrote:



http://www.qnx.com/news/pr/aug21_00-pent.html

I don’t know if this refers to tricks, so much as access to some
undocumented features of the PIII.


Doesn’t matter, that tells me kernel is tuned for a perticular (most
probably
for each) architecture.

No. It tells you that if a particular manufacturer chooses to let QSSL
in on some undocumented capabilities of the processor, that QSSL will
take advantage of them.

It does not mean that QSSL will tune the entire O/S (including
executable model) to squeeze every last scrap of performance out of one
processor to the exclusion of processor independence (which is what was
done with QNX4).

I am not advocating the QNX4 philosophy, I am simply saying, that one
can’t expect an equivalent application on QNX6 to perform as well as
that same application on QNX4 on equivalent hardware.

What are the actual differences ? I don’t know. Intuitively, I know
that there is a significant performance difference. It is definately
telling that on the QNX website, they quote numbers for QNX6 for all
processors except x86 (so that you can’t do a 1:1 comparison with
QNX4; however, the context switch time quoted for QNX4 on a 133Mhz PI is
1.95 usec, while the context switch time for a 207 Mhz SA-1110 (which
one would expect to be significantly better than a 133Mhz P1) is 1.8 usec.

Igor, new/delete simple calls malloc/free. The malloc library supplied
with qnx6 is designed to be memory efficient, not speed efficient. If fast
new/delete cycles are important, then I suggest that you either download
some other malloc lib or write you own that you can tune for the usage
profile of your application.

Regards,

Colin

Igor Levko <no_spam@nihrena.net> wrote:

After doing some benchmarks, I can say that the code quality
generated by gcc is not bad at all. Sometimes it’s simply
better then watcom especially in floating point arithmetic.
But the system libraries seem to be not optimized at all.
For example, try simple new/delete loop. In my case exe
produced by gcc was 4 times slower then wc’s one!

For those who are interested in benchmarking there is Bench++
suite which is free.

cheers,
Igor


I’m not thrilled with RTP perform

Me neither, QNX4 definitely feels much faster.
But that’s on day to day usage as a development machine.
gcc doesn’t help give an impression of speed. The performance
of Phindows (which I use all the time) is much slower on QNX6 as
well.

I have been thinking of running a benchmark that consists of assembly
code , that wouldn’t make no system or library call. That would
remove the compiler and libraries out of the equation.


cburgess@qnx.com

I know that new/delete could end up calling malloc/free though it’s not
mandatory. I guess that malloc/free could be also wrappers for some kernel
calls.
But it doesn’t matter.
Colin, I just don’t get it, you wrote:
“designed to be memory efficient, not speed efficient” in other words
Memory allocation/delallocation transactions have been designed in such a
way
so that they became more memory efficient but 4 times slower!
I always thought that one of important aspects of a function efficiency
is how fast it can run.
Am I right that now in order to make memory allocation be also fast one
has to introduce something like its own implementation of a memory pool
to minimize kernel calls?

cheers,
Igor

“Colin Burgess” <cburgess@qnx.com> wrote in message
news:a2ncr4$mou$1@nntp.qnx.com

Igor, new/delete simple calls malloc/free. The malloc library supplied
with qnx6 is designed to be memory efficient, not speed efficient. If
fast
new/delete cycles are important, then I suggest that you either download
some other malloc lib or write you own that you can tune for the usage
profile of your application.

Regards,

Colin

Igor Levko <> no_spam@nihrena.net> > wrote:
After doing some benchmarks, I can say that the code quality
generated by gcc is not bad at all. Sometimes it’s simply
better then watcom especially in floating point arithmetic.
But the system libraries seem to be not optimized at all.
For example, try simple new/delete loop. In my case exe
produced by gcc was 4 times slower then wc’s one!

For those who are interested in benchmarking there is Bench++
suite which is free.

cheers,
Igor


I’m not thrilled with RTP perform

Me neither, QNX4 definitely feels much faster.
But that’s on day to day usage as a development machine.
gcc doesn’t help give an impression of speed. The performance
of Phindows (which I use all the time) is much slower on QNX6 as
well.

I have been thinking of running a benchmark that consists of assembly
code , that wouldn’t make no system or library call. That would
remove the compiler and libraries out of the equation.

\

cburgess@qnx.com

You can optimize for time or space but rarely both. What Colin is saying is
that we have optimized in such a way as to make efficient use of a small
amount of memory (avoiding fragmentation, etc) as is appropriate on an
embedded system. Essentially it boils down to the fact that benchmarking
allocation/deallocation for speed is not necessarily appropriate in our
case. If you were to benchmark them for space efficiency, you might see a
difference when you compare to other, faster, algorithms.

That’s not to say that there isn’t room for improvement, it’s just that our
goal was different.

cheers,

Kris

“Igor Levko” <no_spam@nihrena.net> wrote in message
news:a2p9ra$o7a$1@inn.qnx.com

I know that new/delete could end up calling malloc/free though it’s not
mandatory. I guess that malloc/free could be also wrappers for some kernel
calls.
But it doesn’t matter.
Colin, I just don’t get it, you wrote:
“designed to be memory efficient, not speed efficient” in other words
Memory allocation/delallocation transactions have been designed in such a
way
so that they became more memory efficient but 4 times slower!
I always thought that one of important aspects of a function efficiency
is how fast it can run.
Am I right that now in order to make memory allocation be also fast one
has to introduce something like its own implementation of a memory pool
to minimize kernel calls?

cheers,
Igor

“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:a2ncr4$mou$> 1@nntp.qnx.com> …
Igor, new/delete simple calls malloc/free. The malloc library supplied
with qnx6 is designed to be memory efficient, not speed efficient. If
fast
new/delete cycles are important, then I suggest that you either download
some other malloc lib or write you own that you can tune for the usage
profile of your application.

Regards,

Colin

Igor Levko <> no_spam@nihrena.net> > wrote:
After doing some benchmarks, I can say that the code quality
generated by gcc is not bad at all. Sometimes it’s simply
better then watcom especially in floating point arithmetic.
But the system libraries seem to be not optimized at all.
For example, try simple new/delete loop. In my case exe
produced by gcc was 4 times slower then wc’s one!

For those who are interested in benchmarking there is Bench++
suite which is free.

cheers,
Igor


I’m not thrilled with RTP perform

Me neither, QNX4 definitely feels much faster.
But that’s on day to day usage as a development machine.
gcc doesn’t help give an impression of speed. The performance
of Phindows (which I use all the time) is much slower on QNX6 as
well.

I have been thinking of running a benchmark that consists of assembly
code , that wouldn’t make no system or library call. That would
remove the compiler and libraries out of the equation.

\

cburgess@qnx.com

Igor Levko wrote:

I know that new/delete could end up calling malloc/free though it’s not
mandatory. I guess that malloc/free could be also wrappers for some kernel
calls.
But it doesn’t matter.
Colin, I just don’t get it, you wrote:
“designed to be memory efficient, not speed efficient” in other words
Memory allocation/delallocation transactions have been designed in such a
way
so that they became more memory efficient but 4 times slower!
I always thought that one of important aspects of a function efficiency
is how fast it can run.
Am I right that now in order to make memory allocation be also fast one
has to introduce something like its own implementation of a memory pool
to minimize kernel calls?

That is an obvious way in which time and size can be traded off.

This seems to be the “correct” approach for either an embedded, or a
real-time allocator (since real-time designs should only allocate memory
once at initialization, the space efficiency is more useful than the per
function call time efficiency). It also seems reasonable that QNXs’
default behavior should be that which is most consistant with embedded
and real-time systems.

I am sure there are other (less obvious) allocator optimizations that
could favor space efficiency over time efficiency.

Okay, I held up for long enough, here it goes.

Was your (QNX) goal to ruin your very reputation? I fail to see any logic in
choosing such a tradeoff. Memory is getting dirt-cheap and cheaper, QNX is
trading 4x factor of speed (it used to be 10x with earlier NTO versions) to
gain some dubious gain (very modest at best) in memory usage. Just what was
the person making that decision thinking? Who the heck even gonna notice
that space efficiency? But people are quick to notice speed-inefficiency, as
you can see. In real life QNX actually needs more memory than other OS-es
because of memory-inefficient design of some major subsystems. Wanna improve
memory usage? Make your I/O subsystem use unified buffer cache, that will
help much more than your current ‘spending a dollar to save a penny’
approach.

Now about NTO being portable/not optimized for CPU. That is lame argument to
say the least. I don’t see why a company with 20 years of X86 experience
should throw that experience away. I’ve heard X86 is actually at the end of
list of supported platforms now, in terms of revenues generated. That is
understandable, but it is still number 1 in terms of public perception.
Isn’t it little short-sighted to ignore that? Also, portability is a virtue
for OS vendor, but not for customers usually. Particularly when kernel
source in not available anyway. I understand it costs a lot of money to port
to new CPU, but I also understand QNX only does that for really big
contracts. Such contracts should pay enough to afford decent optimisations
per CPU. You do have separate kernel for each one anyway, don’t you? So
don’t waste time on excuses, write critical parts in goddamn assembler if
you have to, but don’t make new versions slower than old ones. That ruins
one of the few arguments your advocates always had to convince someone.

That said, I know that x86 kernel has CPU-specific optimisations, and I know
PPC kernel does too. I know there is work being done to improve efficiency
in other parts as well. Unfortunately, every time an issue is brought up,
there is bunch of excuses instead of constructive discussion from QNX staff.
This is probably because it is not really comfortable for them to do it in
public forum. Nobody wants to be held responsible and an excuse is a safest
statement one can make, short of silence :wink:

Harris, you might wanna think how to improve this situation. There should be
someone authorized to say something like ‘yes we know of a problem and here
is what we’re doing about it’. Of course, when it is being the case. It all
really comes down to the strategy QNX as a company uses to move forward. So
far it is always has been strategy of hiding in the dark to make a fast move
when opportunity comes. It is workable strategy for a small underdog and it
got you this far, but if you ever want to become something more than that in
public eyes, you’ve got to do some open moves. Defining some market
tendencies, instead of following ones. Some roadmaps would not hurt. Now I
am daydreaming :wink:

  • igor

“Kris Warkentin” <kewarken@qnx.com> wrote in message
news:a2pfcs$8vv$1@nntp.qnx.com

You can optimize for time or space but rarely both. What Colin is saying
is
that we have optimized in such a way as to make efficient use of a small
amount of memory (avoiding fragmentation, etc) as is appropriate on an
embedded system. Essentially it boils down to the fact that benchmarking
allocation/deallocation for speed is not necessarily appropriate in our
case. If you were to benchmark them for space efficiency, you might see a
difference when you compare to other, faster, algorithms.

That’s not to say that there isn’t room for improvement, it’s just that
our
goal was different.

cheers,

Kris

“Igor Levko” <> no_spam@nihrena.net> > wrote in message
news:a2p9ra$o7a$> 1@inn.qnx.com> …
I know that new/delete could end up calling malloc/free though it’s not
mandatory. I guess that malloc/free could be also wrappers for some
kernel
calls.
But it doesn’t matter.
Colin, I just don’t get it, you wrote:
“designed to be memory efficient, not speed efficient” in other words
Memory allocation/delallocation transactions have been designed in such
a
way
so that they became more memory efficient but 4 times slower!
I always thought that one of important aspects of a function efficiency
is how fast it can run.
Am I right that now in order to make memory allocation be also fast one
has to introduce something like its own implementation of a memory pool
to minimize kernel calls?

cheers,
Igor

“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:a2ncr4$mou$> 1@nntp.qnx.com> …
Igor, new/delete simple calls malloc/free. The malloc library
supplied
with qnx6 is designed to be memory efficient, not speed efficient. If
fast
new/delete cycles are important, then I suggest that you either
download
some other malloc lib or write you own that you can tune for the usage
profile of your application.

Regards,

Colin

Igor Levko <> no_spam@nihrena.net> > wrote:
After doing some benchmarks, I can say that the code quality
generated by gcc is not bad at all. Sometimes it’s simply
better then watcom especially in floating point arithmetic.
But the system libraries seem to be not optimized at all.
For example, try simple new/delete loop. In my case exe
produced by gcc was 4 times slower then wc’s one!

For those who are interested in benchmarking there is Bench++
suite which is free.

cheers,
Igor


I’m not thrilled with RTP perform

Me neither, QNX4 definitely feels much faster.
But that’s on day to day usage as a development machine.
gcc doesn’t help give an impression of speed. The performance
of Phindows (which I use all the time) is much slower on QNX6 as
well.

I have been thinking of running a benchmark that consists of
assembly
code , that wouldn’t make no system or library call. That would
remove the compiler and libraries out of the equation.

\

cburgess@qnx.com
\

I completely agree.


Bill Caroselli – 1(626) 824-7983
Q-TPS Consulting
QTPS@EarthLink.net


“Igor Kovalenko” <kovalenko@home.com> wrote in message
news:a2pjkg$27o$1@inn.qnx.com

Okay, I held up for long enough, here it goes.

Was your (QNX) goal to ruin your very reputation? I fail to see any logic
in
choosing such a tradeoff. Memory is getting dirt-cheap and cheaper, QNX is
trading 4x factor of speed (it used to be 10x with earlier NTO versions)
to
gain some dubious gain (very modest at best) in memory usage. Just what
was
the person making that decision thinking? Who the heck even gonna notice
that space efficiency? But people are quick to notice speed-inefficiency,
as
you can see. In real life QNX actually needs more memory than other OS-es
because of memory-inefficient design of some major subsystems. Wanna
improve
memory usage? Make your I/O subsystem use unified buffer cache, that will
help much more than your current ‘spending a dollar to save a penny’
approach.

Now about NTO being portable/not optimized for CPU. That is lame argument
to
say the least. I don’t see why a company with 20 years of X86 experience
should throw that experience away. I’ve heard X86 is actually at the end
of
list of supported platforms now, in terms of revenues generated. That is
understandable, but it is still number 1 in terms of public perception.
Isn’t it little short-sighted to ignore that? Also, portability is a
virtue
for OS vendor, but not for customers usually. Particularly when kernel
source in not available anyway. I understand it costs a lot of money to
port
to new CPU, but I also understand QNX only does that for really big
contracts. Such contracts should pay enough to afford decent optimisations
per CPU. You do have separate kernel for each one anyway, don’t you? So
don’t waste time on excuses, write critical parts in goddamn assembler if
you have to, but don’t make new versions slower than old ones. That ruins
one of the few arguments your advocates always had to convince someone.

That said, I know that x86 kernel has CPU-specific optimisations, and I
know
PPC kernel does too. I know there is work being done to improve efficiency
in other parts as well. Unfortunately, every time an issue is brought up,
there is bunch of excuses instead of constructive discussion from QNX
staff.
This is probably because it is not really comfortable for them to do it in
public forum. Nobody wants to be held responsible and an excuse is a
safest
statement one can make, short of silence > :wink:

Harris, you might wanna think how to improve this situation. There should
be
someone authorized to say something like ‘yes we know of a problem and
here
is what we’re doing about it’. Of course, when it is being the case. It
all
really comes down to the strategy QNX as a company uses to move forward.
So
far it is always has been strategy of hiding in the dark to make a fast
move
when opportunity comes. It is workable strategy for a small underdog and
it
got you this far, but if you ever want to become something more than that
in
public eyes, you’ve got to do some open moves. Defining some market
tendencies, instead of following ones. Some roadmaps would not hurt. Now I
am daydreaming > :wink:

  • igor

“Igor Kovalenko” <kovalenko@home.com> wrote in message
news:a2pjkg$27o$1@inn.qnx.com

Okay, I held up for long enough, here it goes.


Followed up in advocacy