Shared memory access time

Mike_Gorchak1 · September 6, 2005, 4:43am

Hello, All!

Why access speed to the shared memory few times slower than with usual heap
memory ? Sharem memory has been created without NOCACHE flag. What am I miss
?

With best regards, Mike Gorchak. E-mail: mike@malva.com.ua

Sunil_Kittur1 · September 6, 2005, 10:52am

Is this on an ARM processor?
If so, shared memory mappings are implicitly performed with
the PROT_NOCACHE flag. The MMU uses a virtually-indexed,
virtually-tagged cache so cacheable shared memory mappings
will create incoherent cache aliases.

Sunil

Mike Gorchak wrote:

Hello, All!

Why access speed to the shared memory few times slower than with usual heap
memory ? Sharem memory has been created without NOCACHE flag. What am I miss
?

With best regards, Mike Gorchak. E-mail: > mike@malva.com.ua

Mike_Gorchak1 · September 6, 2005, 11:48am

Hello, Sunil!

SK> Is this on an ARM processor?
SK> If so, shared memory mappings are implicitly performed with
SK> the PROT_NOCACHE flag. The MMU uses a virtually-indexed,
SK> virtually-tagged cache so cacheable shared memory mappings
SK> will create incoherent cache aliases.

No, it’s on one true processor - x86 Looks like PROT_NOCACHE affects x86
cpus too.

With best regards, Mike Gorchak. E-mail: mike@malva.com.ua

Evan_Hillas1 · September 6, 2005, 1:29pm

Mike Gorchak wrote:

No, it’s on one true processor - x86 > > Looks like PROT_NOCACHE affects x86
cpus too.

Cough!

Sunil_Kittur1 · September 6, 2005, 3:02pm

How exactly are you creating and mapping the shared memory?

x86 uses a physical cache, so if your shared memory is using
system ram, the mappings should be cacheable.

Sunil.

Mike Gorchak wrote:

Hello, Sunil!

SK> Is this on an ARM processor?
SK> If so, shared memory mappings are implicitly performed with
SK> the PROT_NOCACHE flag. The MMU uses a virtually-indexed,
SK> virtually-tagged cache so cacheable shared memory mappings
SK> will create incoherent cache aliases.

No, it’s on one true processor - x86 > > Looks like PROT_NOCACHE affects x86
cpus too.

With best regards, Mike Gorchak. E-mail: > mike@malva.com.ua

David_Gibbs1 · September 6, 2005, 4:23pm

Mike Gorchak <mike@malva.com.ua> wrote:

Hello, All!

Why access speed to the shared memory few times slower than with usual heap
memory ? Sharem memory has been created without NOCACHE flag. What am I miss
?

I’ve seen some of the other responses – on non-ARM, access to shared
memory shouldn’t be any slower than access to heap. Both are mapped
into the process’ address space essentially the same way – through a
mmap() call.

How did you determine that this is the case? Can you post your test
code so we can try to reproduce it?

My first guess on things like this is usually cache effects of one already
being in cache, and the other not. The other thing that can cause this
sort of problem or behaviour is aligned vs un-aligned access. But without
seeing actual code, it is hard to know whether it is an actual memory
problem, or an artifact of how the test program was written.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Ken_Schumm1 · September 29, 2005, 10:25pm

How much slower is shared memory on an ARM?

I’m beginning a new design that uses an Intel PXA270 XScale and was
planning on using shared memory fairly extensively. If that will cause
a big performance hit I’ll take another approach.

Sunil_Kittur1 · September 30, 2005, 11:24am

Because of the CPU’s virtual cache, shared memory mappings
have to be forced to PROT_NOCACHE to prevent caches aliases.

The pxa2xx cpu’s have rather poor uncached memory access
performance so it is significantly slower than cached access.
What other approaches were you considering?

Sunil.

Ken Schumm wrote:

How much slower is shared memory on an ARM?

I’m beginning a new design that uses an Intel PXA270 XScale and was
planning on using shared memory fairly extensively. If that will cause
a big performance hit I’ll take another approach.

David_Gibbs1 · September 30, 2005, 4:48pm

Sunil Kittur <skittur@qnx.com> wrote:

Because of the CPU’s virtual cache, shared memory mappings
have to be forced to PROT_NOCACHE to prevent caches aliases.

Are the “special” mappings that sit in Proc’s space also
done PROT_NOCACHE, or could they be used for better efficiency?

-David

The pxa2xx cpu’s have rather poor uncached memory access
performance so it is significantly slower than cached access.
What other approaches were you considering?

Sunil.

Ken Schumm wrote:
How much slower is shared memory on an ARM?

I’m beginning a new design that uses an Intel PXA270 XScale and was
planning on using shared memory fairly extensively. If that will cause
a big performance hit I’ll take another approach.

–
David Gibbs
QNX Training Services
dagibbs@qnx.com

Ken_Schumm1 · September 30, 2005, 11:19pm

The design hasn’t even started, but the requirements sort of mirror
another qnx4 instrument that we built on an x86 pc/104 platform that
used shared memory. That worked well so I was going to do the same
thing. I could build it on message passing, but that will have the
expense of more context switches.

Maybe I’ll have to benchmark both methods and see.

On Fri, 30 Sep 2005 12:24:29 +0100, Sunil Kittur <skittur@qnx.com>
wrote:

Because of the CPU’s virtual cache, shared memory mappings
have to be forced to PROT_NOCACHE to prevent caches aliases.

The pxa2xx cpu’s have rather poor uncached memory access
performance so it is significantly slower than cached access.
What other approaches were you considering?

Sunil.

Ken Schumm wrote:
How much slower is shared memory on an ARM?

I’m beginning a new design that uses an Intel PXA270 XScale and was
planning on using shared memory fairly extensively. If that will cause
a big performance hit I’ll take another approach.