Wacom C 10.6 versus Intel Compiler 9

I ported an application from QNX4 to Windows XP. The application accepting
incomming data via TCP/IP, perform data analysis and then spit the result
back via TCP/IP.

I was very curious to see how the Intel C compiler would compare with
Watcom. I have turn every optimisation on and tried bunch of diffdrent
combinaison of optimisation flag, I looked at the assembly code and the
Intel compiler threw a bunch of SSE instruction ( I had high hope this would
make a big difference).

The best the Intel compiler could do is match Watcom.,whichI though was
worth a post , the Intel executalbe is 33% bigger :wink:

That being said, the code was run on AMD64, maybe it could run faster on P4.
The source code had gone through many years of hand crafted optimisation
which i guess help Watcom.

The code was hacked so that it would accept to run on AMD64 without having
to generate a specific code path for both P4 “Genuine Intel” and non
“|Genuine Intel”, maybe that’s negatively affected the performance on AMD64.

  • Mario

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:dkvrjd$b1g$1@inn.qnx.com

I ported an application from QNX4 to Windows XP. The application
accepting incomming data via TCP/IP, perform data analysis and then spit
the result back via TCP/IP.

I was very curious to see how the Intel C compiler would compare with
Watcom. I have turn every optimisation on and tried bunch of diffdrent
combinaison of optimisation flag, I looked at the assembly code and the
Intel compiler threw a bunch of SSE instruction ( I had high hope this
would make a big difference).

The best the Intel compiler could do is match Watcom.,whichI though was
worth a post , the Intel executalbe is 33% bigger > :wink:

That being said, the code was run on AMD64, maybe it could run faster on
P4. The source code had gone through many years of hand crafted
optimisation which i guess help Watcom.

The code was hacked so that it would accept to run on AMD64 without having
to generate a specific code path for both P4 “Genuine Intel” and non
“|Genuine Intel”, maybe that’s negatively affected the performance on
AMD64.

By increasing the size of the data by 2, the Intel compiler got a 5% edge.

  • Mario

Mario Charest postmaster@127.0.0.1 wrote:

A couple things:

Watcom had a pretty good reputation for high-quality code-gen.

Testing an Intel compiler generated binary on an AMD chipset may not be
fair, as the compiler may make choices that are correct for an Intel chipset,
rather than an AMD one.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:dl32ee$lj1$1@inn.qnx.com

Mario Charest postmaster@127.0.0.1 wrote:

A couple things:

Watcom had a pretty good reputation for high-quality code-gen.

Testing an Intel compiler generated binary on an AMD chipset may not be
fair, as the compiler may make choices that are correct for an Intel
chipset,
rather than an AMD one.

I agree, we should received an P4 machine this week. However for our
application it seems the AMD is running circle around the P4.

I examined the assembly code generated by the Intel Compiler and was very
impressed. I was very puzzled at why it wouldn’t run faster then Watcom.
Today I may have find the answer. We have a fonction that round a float
value into an interger (C language truncates it).
That function is written in assembly and takes 2 instructions. However in
the Intel compiler version this function was replace by a macro because I
didn’t know how to write inline assembly code for that compiler. But now
with that macro it resulsts in 8 instructions, quite a big difference, plus
hat code as a compare and a jump, which isn’t good.

I know that under Watcom using a macro instead of an inline function made
the program 25% slower, so maybe if I can manage to rewrite that functino
with inline assembly code the Intel compiler will give better perfomance, as
I expect it to do.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Mario Charest postmaster@127.0.0.1 wrote:
[…]

the program 25% slower, so maybe if I can manage to rewrite that functino

I know “functino” was a typo, but it’s a cool new word :wink: Sort of like
“neutrino”. A functino would be a small very fast function.

regards,
rick

Athlon64 is pretty close to P4 in terms of features. It supports SSE2 and
SSE3.
So, unless Intel has stuffed some nasty tricks into the compiler to punish
AMD, differences in performance are more likely to be caused by pipeline
length and cache design. If the code has good cache locality then P4 would
benefit from its trace cache and higher clock rate. Otherwise A64 would
benefit from low latency memory access due to its in-core DRAM controller.

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:dl32ee$lj1$1@inn.qnx.com

Mario Charest postmaster@127.0.0.1 wrote:

A couple things:

Watcom had a pretty good reputation for high-quality code-gen.

Testing an Intel compiler generated binary on an AMD chipset may not be
fair, as the compiler may make choices that are correct for an Intel
chipset,
rather than an AMD one.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

“Igor Kovalenko” <kovalenko@comcast.net> wrote in message
news:dlk4fb$c46$1@inn.qnx.com

Athlon64 is pretty close to P4 in terms of features. It supports SSE2 and
SSE3.
So, unless Intel has stuffed some nasty tricks into the compiler to punish
AMD,

They did.

Unless you do some hacking, the code will refuse to start on AMD. The other
options is to allow code to run on generic x86 but this mean there will be
two code paths in the exeutable. One for genuine Intel part and one for non
genuine Intel part…


differences in performance are more likely to be caused by pipeline

length and cache design. If the code has good cache locality then P4 would
benefit from its trace cache and higher clock rate. Otherwise A64 would
benefit from low latency memory access due to its in-core DRAM controller.

“David Gibbs” <> dagibbs@qnx.com> > wrote in message
news:dl32ee$lj1$> 1@inn.qnx.com> …
Mario Charest postmaster@127.0.0.1 wrote:

A couple things:

Watcom had a pretty good reputation for high-quality code-gen.

Testing an Intel compiler generated binary on an AMD chipset may not be
fair, as the compiler may make choices that are correct for an Intel
chipset,
rather than an AMD one.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Mario Charest wrote:

“Igor Kovalenko” <> kovalenko@comcast.net> > wrote in message
news:dlk4fb$c46$> 1@inn.qnx.com> …

Athlon64 is pretty close to P4 in terms of features. It supports SSE2 and
SSE3.
So, unless Intel has stuffed some nasty tricks into the compiler to punish
AMD,


They did.

Unless you do some hacking, the code will refuse to start on AMD. The other
options is to allow code to run on generic x86 but this mean there will be
two code paths in the exeutable. One for genuine Intel part and one for non
genuine Intel part…

Well … AMD is right now taking Intel to court for this behaviour as well as unfair industry pressuring.


Evan

“Evan Hillas” <evanh@clear.net.nz> wrote in message
news:dllslq$l6s$1@inn.qnx.com

Mario Charest wrote:
“Igor Kovalenko” <> kovalenko@comcast.net> > wrote in message
news:dlk4fb$c46$> 1@inn.qnx.com> …

Athlon64 is pretty close to P4 in terms of features. It supports SSE2 and
SSE3.
So, unless Intel has stuffed some nasty tricks into the compiler to
punish AMD,


They did.

Unless you do some hacking, the code will refuse to start on AMD. The
other options is to allow code to run on generic x86 but this mean there
will be two code paths in the exeutable. One for genuine Intel part and
one for non genuine Intel part…



Well … AMD is right now taking Intel to court for this behaviour as well
as unfair industry pressuring.

As far as compiler goes, my view is it’s their compiler and I fell they are
entitled to decided if it will run on processor of the competitor. Let AMD
bring their own :wink:

What I have problem with is telling a board make that they will be on
allocation or will not get as many CPU as they would like if they do a
design that is AMD based. I know a guy who worked for a company that makes
CPU board of all kind and they had (that was 5 years ago) to ask permission
to Intel to make an design that wasn’t Intel based (being AMD or Cyrix,
etc). That I think is unfair. If you can’t compete and have to resort to
pressure and scare tactic that I feel is unfair. But then who says business
as to be fair.

I don’t think the deer getting a bullet in the heart would say humain are
fair. For crying out loud, fair would be hand to hand combat …

Evan

Mario Charest wrote:

“Evan Hillas” <> evanh@clear.net.nz> > wrote in message
Well … AMD is right now taking Intel to court for this behaviour as well
as unfair industry pressuring.


As far as compiler goes, my view is it’s their compiler and I fell they are
entitled to decided if it will run on processor of the competitor. Let AMD
bring their own > :wink:

I think you’ll find that the tweaks aren’t good for the pentiums either just that they are particularly bad for the athlons.


What I have problem with is telling a board make that they will be on
allocation or will not get as many CPU as they would like if they do a
design that is AMD based. I know a guy who worked for a company that makes
CPU board of all kind and they had (that was 5 years ago) to ask permission
to Intel to make an design that wasn’t Intel based (being AMD or Cyrix,

There’s the crux, the OEMs and the public lose their choices. Free-markets naturally form monopolies without regulation.


Evan