MMX and Geode GX1

Hi,

I’m experimenting with a MMX routine, which does a scalar product between
two vectors using PADDWD instructions. The routine was originally published
on the Intel web site as an application note. I have ported it to QNX RTP
6.1.

What puzzles me is the following: the routine runs in 1600 cpu cycles on a
PII machine and in 5000 cpu cycles on the Geode. Is this normal ?

I’m not sure if the problem resides with differences in cache memory ( the
data is two 16bit vectors of 256 elements, it should’t be a problem ), or if
each MMX instruction needs more cycles on the Geode simply due to a
different processor microkernel.

Has anybody benchmarked the MMX of the Geode ?

Thanks a lot for any help,

Andrea.

I was pretty disappointed in the performance of the Geode. In comparing a
200 MHz Geode with a 133 MHz AMD Elan SC520, for the same workload, the
Geode was 79% utilized, while the Elan was 88% utilized. I would have
expected 55% utilization, just comparing clock rates. I never did figure out
where those extra CPU cycles were going.

Marty Doane
Siemens Dematic

“Andrea Borsic” <aborsic@brookes.ac.uk> wrote in message
news:aap4as$8bu$1@inn.qnx.com

Hi,

I’m experimenting with a MMX routine, which does a scalar product between
two vectors using PADDWD instructions. The routine was originally
published
on the Intel web site as an application note. I have ported it to QNX RTP
6.1.

What puzzles me is the following: the routine runs in 1600 cpu cycles on a
PII machine and in 5000 cpu cycles on the Geode. Is this normal ?

I’m not sure if the problem resides with differences in cache memory ( the
data is two 16bit vectors of 256 elements, it should’t be a problem ), or
if
each MMX instruction needs more cycles on the Geode simply due to a
different processor microkernel.

Has anybody benchmarked the MMX of the Geode ?

Thanks a lot for any help,

Andrea.