Tricks of the trade - Optimisation

We’re also “paying” customers, and we use CW beta 4 with NTO2.0C as well as
QNXRTP. There is some uncertainty about this product, especially in light of
the new C++ libraries that will be released later this year:
http://www.qnx.com/news/pr/sep12_00-Dnkw.html

I personally have put CW aside while I’m cutting the C++ code in question,
which is unfortunate since our company paid a fortune for the CW licenses.
When we decided to buy CW we never questioned the ability to compile
templates (I guess these days you just expect those sorts of things).

Let’s see what comes of Dinkumware and Codewarrior… I enjoy coding in CW
and now that I’ve temporarily gone to self hosting appreciate the nice IDE.
The current gcc in CW has to be upgraded though…


| / | __ ) | Karsten.Hoffmann@mbs-software.de MBS-GmbH
| |/| | _ _
\ Phone : +49-2151-7294-38 Karsten Hoffmann
| | | | |
) |__) | Fax : +49-2151-7294-50 Roemerstrasse 15
|| ||// Mobile: +49-172-3812373 D-47809 Krefeld

Miguel Simon wrote:

Hi…

Does any one know if there is a such a thing as a two (2) processor
PC104??

We have a solution based on a dual ported memory
board (PC104 board with two
seperate bus interfaces, 32Kb memory, mailbox
interrupts) … two CPU (or more) can communicate
throughout the DPM board ( a Net.dpm driver must
still be ported .. but it is not a big issue)

Armin

http://www.steinhoff.de

ditto :slight_smile:

“Karsten P. Hoffmann” wrote:


| / | __ ) | > Karsten.Hoffmann@mbs-software.de > MBS-GmbH
| |/| | _ _
\ Phone : +49-2151-7294-38 Karsten Hoffmann
| | | | |
) |__) | Fax : +49-2151-7294-50 Roemerstrasse 15
|| ||// Mobile: +49-172-3812373 D-47809 Krefeld

In qdn.public.porting Armin Steinhoff <A-Steinhoff@web_.de> wrote:

ditto > :slight_smile:

You may actually see crossposting disabled on QDN in the near future.

In article <8uh0os$nv$1@nntp.qnx.com>, pete@qnx.com says…

In qdn.public.porting Armin Steinhoff <A-Steinhoff@web_.de> wrote:

ditto > :slight_smile:

You may actually see crossposting disabled on QDN in the near future.

Please do so, there seems to be a lot of unnecessary cross posting as well as
posting in obviously incorrect newgroups (e.g., news)

In article <8uh0os$nv$1@nntp.qnx.com>, pete@qnx.com says…

In qdn.public.porting Armin Steinhoff <A-Steinhoff@web_.de> wrote:

ditto > :slight_smile:

You may actually see crossposting disabled on QDN in the near future.

Pete,

cross postings are a good tools … as long as it is used sensefully.
Is it possible to limit the number of addressed groups to two ?

Armin

In qdn.public.porting Armin Steinhoff <A-Steinhoff@web_.de> wrote:


Miguel Simon wrote:

Hi…

Does any one know if there is a such a thing as a two (2) processor
PC104??

We have a solution based on a dual ported memory
board (PC104 board with two
seperate bus interfaces, 32Kb memory, mailbox
interrupts) … two CPU (or more) can communicate
throughout the DPM board ( a Net.dpm driver must
still be ported .. but it is not a big issue)

I’m not sure.

It’s just that I mentioned the recent rash of crossposts to the people who
maintain the newsgroups, and they said they were already thinking about
curtailing the ability to crosspost.

pete@qnx.com wrote:

In qdn.public.porting Armin Steinhoff <A-Steinhoff@web_.de> wrote:

Miguel Simon wrote:

Hi…

Does any one know if there is a such a thing as a two (2) processor
PC104??

We have a solution based on a dual ported memory
board (PC104 board with two
seperate bus interfaces, 32Kb memory, mailbox
interrupts) … two CPU (or more) can communicate
throughout the DPM board ( a Net.dpm driver must
still be ported .. but it is not a big issue)

I’m not sure.

Why do you think so ? I did it already for an
other system …

It’s just that I mentioned the recent rash of crossposts to the people who
maintain the newsgroups, and they said they were already thinking about
curtailing the ability to crosspost.

And what’s the reason for your cross posting ??

Armin

I’ve been away for a week and am just now starting to read the newsgroups.
So I haven’t even finished reading this thread yet, but an obvious question
comes to mind.

Can the Watcom C/C++ compiler be made to produce code that will link in with
the gcc object modules?

I’ve always been very impressed with it’s generated code when optimized,
even though I often could’n understand what the hech they were doing.

Mario Charest <mcharest@zinformatic.com> wrote in message
news:8ubgn0$2r8$1@inn.qnx.com

“Dan” <> none@no.spam> > wrote in message news:8ub3jg$jlr$> 1@inn.qnx.com> …
For those not familiar with my problem: I have a process that uses a lot
of
matrix algebra for which we used MTL implementation. The code was slow,
about 18ms per iteration. I needed to get the cycle time down to 2-3 ms
(max
about 5 I guessed at the time) and thanks to all those that replied in
the
previous thread suggesting to rewrite the code because that’s what was
done
to spectacular results.

The code was recompiled with different matrix algebra class and lot of
overhead ripped out, so now I’m happy because we got the cycle time down
to
1.5ms! That is better than 10 times faster than the original MTL code!

We also compiled the code using microsoft C++ v6 and got very
interesting
results. the MS code runs twice (!) as fast on on NT than the qcc code
on
QNXRTP!!! Both are on same machine (just swap the HD) P-MMX 200MHz. Even
when I pump up the priority it runs the same times since there is not
much
else running.


Visual C++ is a very good compiler. Obviously GCC has a different agenda
then Visual C++. Personnaly it saddens me, cause Neutrino is IMHO a
synomyn
of performance but it’s all undermind by GCC. But that’s a subject for
qdn.public.qnxrtp.advocay.


This begs a few questions:

  1. Why is MTL so slow on gcc 2.95.2? Is it a problem of handling
    templates?

  2. Why does microsoft produce faster code? or better Why does gcc
    produce
    slow code?

  3. Will additional optimiser options speed it up (I’m using O2 and
    inline)

Also I wonder if anyone has similar experience, or is this specific to
my
code?

I had the same experience as you, the difference between VisualC++ and GCC
was in the order of 30% (that’s still a lot)

Of course two examples don’t prove anything, but is there a case to
check the compiler and benchmark it properly against others? It may be a
significant issue if you need to go from 200MHz to 450Mhz processor to
achieve the same thing in real time.
\

“Bill at Sierra Design” <BC@SierraDesign.com> wrote in message
news:8upass$9kq$1@inn.qnx.com

I’ve been away for a week and am just now starting to read the newsgroups.
So I haven’t even finished reading this thread yet, but an obvious
question
comes to mind.

Can the Watcom C/C++ compiler be made to produce code that will link in
with
the gcc object modules?

I’ve always been very impressed with it’s generated code when optimized,
even though I often could’n understand what the hech they were doing.

Watcom isn’t much better then GCC in generate ( although it compiles MUCH
faster).
It’s Visual C++ for x86 that has quite an advantage.



Mario Charest <> mcharest@zinformatic.com> > wrote in message
news:8ubgn0$2r8$> 1@inn.qnx.com> …

“Dan” <> none@no.spam> > wrote in message news:8ub3jg$jlr$> 1@inn.qnx.com> …
For those not familiar with my problem: I have a process that uses a
lot
of
matrix algebra for which we used MTL implementation. The code was
slow,
about 18ms per iteration. I needed to get the cycle time down to 2-3
ms
(max
about 5 I guessed at the time) and thanks to all those that replied in
the
previous thread suggesting to rewrite the code because that’s what was
done
to spectacular results.

The code was recompiled with different matrix algebra class and lot of
overhead ripped out, so now I’m happy because we got the cycle time
down
to
1.5ms! That is better than 10 times faster than the original MTL code!

We also compiled the code using microsoft C++ v6 and got very
interesting
results. the MS code runs twice (!) as fast on on NT than the qcc code
on
QNXRTP!!! Both are on same machine (just swap the HD) P-MMX 200MHz.
Even
when I pump up the priority it runs the same times since there is not
much
else running.


Visual C++ is a very good compiler. Obviously GCC has a different
agenda
then Visual C++. Personnaly it saddens me, cause Neutrino is IMHO a
synomyn
of performance but it’s all undermind by GCC. But that’s a subject for
qdn.public.qnxrtp.advocay.


This begs a few questions:

  1. Why is MTL so slow on gcc 2.95.2? Is it a problem of handling
    templates?

  2. Why does microsoft produce faster code? or better Why does gcc
    produce
    slow code?

  3. Will additional optimiser options speed it up (I’m using O2 and
    inline)

Also I wonder if anyone has similar experience, or is this specific to
my
code?

I had the same experience as you, the difference between VisualC++ and
GCC
was in the order of 30% (that’s still a lot)

Of course two examples don’t prove anything, but is there a case to
check the compiler and benchmark it properly against others? It may be
a
significant issue if you need to go from 200MHz to 450Mhz processor to
achieve the same thing in real time.


\

I would be curious if the original poster could post some numbers for his
stuff compiled with Watcom on QNX4.

Mario Charest <mcharest@nosmap.com> wrote in message
news:8uph15$fod$1@inn.qnx.com

“Bill at Sierra Design” <> BC@SierraDesign.com> > wrote in message
news:8upass$9kq$> 1@inn.qnx.com> …
Can the Watcom C/C++ compiler be made to produce code that will link in
with
the gcc object modules?

I’ve always been very impressed with it’s generated code when optimized,
even though I often could’n understand what the hech they were doing.


Watcom isn’t much better then GCC in generate ( although it compiles MUCH
faster).
It’s Visual C++ for x86 that has quite an advantage.

I’d like to, but we do not have QNX4. We started working QNX products
(Neutrino) in February this year. We took a certain amount of risk by
plunging straight into a new OS, but we never looked back. It’s perfect for
our application.
There will aways be issues to sort out, new drivers to port etc, but full
credit to people at QNX, they’ve started with a great kernel and built it
into a very good platform for real-time apps.

I think it’s a little unfair to compare gcc with MS VisualC++ because gcc is
much more generic compiler. MS target only one class of processors and can
afford using wider range of instructions including the specialised
instructions found in MMX etc.

Dan

“Bill at Sierra Design” <BC@SierraDesign.com> wrote in message
news:8ups3u$qv5$1@inn.qnx.com

I would be curious if the original poster could post some numbers for his
stuff compiled with Watcom on QNX4.

Mario Charest <> mcharest@nosmap.com> > wrote in message
news:8uph15$fod$> 1@inn.qnx.com> …

“Bill at Sierra Design” <> BC@SierraDesign.com> > wrote in message
news:8upass$9kq$> 1@inn.qnx.com> …
Can the Watcom C/C++ compiler be made to produce code that will link
in
with
the gcc object modules?

I’ve always been very impressed with it’s generated code when
optimized,
even though I often could’n understand what the hech they were doing.


Watcom isn’t much better then GCC in generate ( although it compiles
MUCH
faster).
It’s Visual C++ for x86 that has quite an advantage.

“Dan” <none@no.spam> wrote in message news:8usiv0$l28$1@inn.qnx.com

I’d like to, but we do not have QNX4. We started working QNX products
(Neutrino) in February this year. We took a certain amount of risk by
plunging straight into a new OS, but we never looked back. It’s perfect
for
our application.
There will aways be issues to sort out, new drivers to port etc, but full
credit to people at QNX, they’ve started with a great kernel and built it
into a very good platform for real-time apps.

I think it’s a little unfair to compare gcc with MS VisualC++ because gcc
is
much more generic compiler. MS target only one class of processors and can
afford using wider range of instructions including the specialised
instructions found in MMX etc.

I agree that it maybe unfair, but it’s a fact, then end result is there.
Compiler
on QRTP will generate slower code at the expense of supporting multiple
architechure. GCC is awesome at doing lots of thing, but not the fastest
code.






Dan

“Bill at Sierra Design” <> BC@SierraDesign.com> > wrote in message
news:8ups3u$qv5$> 1@inn.qnx.com> …
I would be curious if the original poster could post some numbers for
his
stuff compiled with Watcom on QNX4.

Mario Charest <> mcharest@nosmap.com> > wrote in message
news:8uph15$fod$> 1@inn.qnx.com> …

“Bill at Sierra Design” <> BC@SierraDesign.com> > wrote in message
news:8upass$9kq$> 1@inn.qnx.com> …
Can the Watcom C/C++ compiler be made to produce code that will link
in
with
the gcc object modules?

I’ve always been very impressed with it’s generated code when
optimized,
even though I often could’n understand what the hech they were
doing.


Watcom isn’t much better then GCC in generate ( although it compiles
MUCH
faster).
It’s Visual C++ for x86 that has quite an advantage.

\

“Dan” <none@no.spam> wrote in message news:8usiv0$l28$1@inn.qnx.com

I’d like to, but we do not have QNX4. We started working QNX products
(Neutrino) in February this year. We took a certain amount of risk by
plunging straight into a new OS, but we never looked back. It’s perfect
for
our application.
There will aways be issues to sort out, new drivers to port etc, but full
credit to people at QNX, they’ve started with a great kernel and built it
into a very good platform for real-time apps.

I agree, QNX6 is an outstanding O.S. and I’m willing to put up with a few
bumps as it grows to its full potential in order to be in full motion when
it matures.

I think it’s a little unfair to compare gcc with MS VisualC++ because gcc
is
much more generic compiler. MS target only one class of processors and can
afford using wider range of instructions including the specialised
instructions found in MMX etc.

Dan

I think that 95% of my projects initially will be x86 architecture. Perhaps
in the future we’ll make a move to other CPUs for some embedded systems, but
we also have a few projects requiring BIG server class machines with huge
memories and SMP which for now dictates x86. I for one would be willing to
buy a 3rd party compiler targeted specifically at the x86 architecture if it
could be shown to take full advantage of all of the different features
available on various Pentium CPUs. The second and perhaps equally as large
issue is all of the QNX6 supplied libraries. It doesn’t really do you much
good to have a monster compiler if all of the libraries are compiled to run
on a 386 and are all optimized for a small footprint at the expense of
execution speed. I liked the Watcom compiler, and I’ve never used the
MicroSchlock compiler (although I’ve heard it’s one of the few things
they’ve done an outstanding job with) because I do all my development under
QNX :slight_smile: I’ve got vi engrained on my nervous system so deeply that it would
probably kill me to have to wrestle with an IDE. But if the MS compiler
could be made to produce objects and executables that would work with RtP, I
would at least take a serious look at it (no matter how much I want to nuke
all MS software from our entire office).

-Warren

Warren Peece wrote:

I think that 95% of my projects initially will be x86 architecture. Perhaps
in the future we’ll make a move to other CPUs for some embedded systems, but
we also have a few projects requiring BIG server class machines with huge
memories and SMP which for now dictates x86.

Just like we are. I’ve heard however about SMP PowerPC (G4) and that
sounded very nice, especially given the message-passing optimizations
QNX has for G4.

I for one would be willing to
buy a 3rd party compiler targeted specifically at the x86 architecture if it
could be shown to take full advantage of all of the different features
available on various Pentium CPUs. The second and perhaps equally as large
issue is all of the QNX6 supplied libraries. It doesn’t really do you much
good to have a monster compiler if all of the libraries are compiled to run
on a 386 and are all optimized for a small footprint at the expense of
execution speed. I liked the Watcom compiler, and I’ve never used the
MicroSchlock compiler (although I’ve heard it’s one of the few things
they’ve done an outstanding job with) because I do all my development under
QNX > :slight_smile: > I’ve got vi engrained on my nervous system so deeply that it would
probably kill me to have to wrestle with an IDE. But if the MS compiler
could be made to produce objects and executables that would work with RtP, I
would at least take a serious look at it (no matter how much I want to nuke
all MS software from our entire office).

Metrowerks was supposed to be that compiler. Bang their door…

  • igor

“Igor Kovalenko” <kovalenko@home.com> wrote in message
news:3A120E14.D74CBA1C@home.com

Warren Peece wrote:

I think that 95% of my projects initially will be x86 architecture.
Perhaps
in the future we’ll make a move to other CPUs for some embedded systems,
but
we also have a few projects requiring BIG server class machines with
huge
memories and SMP which for now dictates x86.

Just like we are. I’ve heard however about SMP PowerPC (G4) and that
sounded very nice, especially given the message-passing optimizations
QNX has for G4.

What G4 desktop/server class systems will run QNX6, Igor? I’d love to try
one but I thought they weren’t supported yet. Could you point me towards
where these G4 optimizations are mentioned or is it one of those
self-extracted bits of information?

I for one would be willing to
buy a 3rd party compiler targeted specifically at the x86 architecture
if it
could be shown to take full advantage of all of the different features
available on various Pentium CPUs. The second and perhaps equally as
large
issue is all of the QNX6 supplied libraries. It doesn’t really do you
much
good to have a monster compiler if all of the libraries are compiled to
run
on a 386 and are all optimized for a small footprint at the expense of
execution speed. I liked the Watcom compiler, and I’ve never used the
MicroSchlock compiler (although I’ve heard it’s one of the few things
they’ve done an outstanding job with) because I do all my development
under
QNX > :slight_smile: > I’ve got vi engrained on my nervous system so deeply that it
would
probably kill me to have to wrestle with an IDE. But if the MS compiler
could be made to produce objects and executables that would work with
RtP, I
would at least take a serious look at it (no matter how much I want to
nuke
all MS software from our entire office).

Metrowerks was supposed to be that compiler. Bang their door…

  • igor

Sounds to me like we need somebody to drive a truck through their door :wink:

-Warren

Warren Peece wrote:

What G4 desktop/server class systems will run QNX6, Igor? I’d love to try
one but I thought they weren’t supported yet.

RTP for PPC is not available, which does not preclude its existence :wink:

One can buy commercial Neutrino 2.x OS runtime for PowerPC platform.
Development environment would be either Metrowerks (which supports both
x86 and PPC targets) or QNX4. There is one for Solaris (in testing yet)
too. Nothing prevents a brave one from buildging a self-hosted compiler
either (I did that for x86, as well as cross for Solaris, with generous
help of QSSL). The gdb can prove to be tough for porting, but then there
is remote debugging…

For a server you can look at Motorola boards. Could be either MPC750
(CompactPCI) or MTX600 (ATX) series. Those are not particularly fast
(233Mhz), but there are G4 versions about to be released soon. The cPCI
G4 board (MCP635) should be available Q1/2001, a non-system slot version
is available already. I’m running Neutrino on MCP750 and I think it is
very nice piece of hardware (not because I work for Motorola, they’re
made by different sector anyway). The OEM Neutrino even comes with
startup code and boot template preconfigured for it (as well as for
several others), so it was really easy to get it up and running.

Could you point me towards
where these G4 optimizations are mentioned or is it one of those
self-extracted bits of information?

I believe it was posted at some time on QNX web site. Basically, they
make good use of the Altivec engine of G4, to optimize message passing.
Claimed to have 2Gb/sec message passing bandwidth including all
overhead.

  • Igor

Mario Charest wrote:

[ clip … ]
I agree that it maybe unfair, but it’s a fact, then end result is there.
Compiler on QRTP will generate slower code at the expense of supporting multiple
architechure.

I wonder wether the performance issues are realy
related to the generated code or to the maturity
and performance of the QNX6 system libraries.

Are the QNX6 libs mature and optimized already?
Optimized in which sense?

GCC is awesome at doing lots of thing, but not the fastest code.

Has someone analysed where most of the CPU time is
consumed … user level? system level ?

Armin

“Armin Steinhoff” <A-Steinhoff@web_.de> wrote in message
news:3A12734A.34B5D851@web_.de…

Mario Charest wrote:

[ clip … ]
I agree that it maybe unfair, but it’s a fact, then end result is there.
Compiler on QRTP will generate slower code at the expense of supporting
multiple
architechure.

I wonder wether the performance issues are realy
related to the generated code or to the maturity
and performance of the QNX6 system libraries.

What I can tell you is that the code I’v seen didn’t
even use libraries call, it was all data crushing.
GCC was 10-20% behind VC++ on the same machine.

I would be extremely suprise (but i guess it’s possible)
that the 10-20% comes from the operating system
overhead.

Are the QNX6 libs mature and optimized already?
Optimized in which sense?

GCC is awesome at doing lots of thing, but not the fastest code.

Has someone analysed where most of the CPU time is
consumed … user level? system level ?

In

Armin