cache coherency

Hi,

Is there any service/function to find out from an application if the cache
of a system is safe or coherent ? Is QNX always cache safe ?

Thx !
Marc Labbé

Marc Labbe <marc.labbe@mindready.com> wrote:

Hi,

Is there any service/function to find out from an application if the cache
of a system is safe or coherent ? Is QNX always cache safe ?

On some systems (x86) it’s always coherent, on others it’s not. The
information about the cache is stored in the syspage. Here’s a
function that determines if there’s a non-snooped data cache in the
system:

#include <sys/syspage.h>

static int MemSmartCache()
{
struct cacheattr_entry *cache_base;
struct cacheattr_entry *cache;
int cache_idx;

cache_base = SYSPAGE_ENTRY(cacheattr);
for (cache_idx = SYSPAGE_ENTRY(cpuinfo)->data_cache;
cache_idx != CACHE_LIST_END;
cache_idx = cache->next) {

cache = &cache_base[cache_idx];
if (!(cache->flags & CACHE_FLAG_SNOOPED)) {
/* Found a cache that isn’t smart */
return 0;
}
}

return 1;
}

Cheers,
Dave

Hi Dave,

I’m working on a SH4 (SystemH-Amanda system running QNX 6.2.1 patch B with
the latest SystemH-Amanda BSP), I posted here because I thought it was more
OS related than only SH4. I knew that the system had non-coherent cache, but
this function will help configure the software to be able to use the cache
(or not). which leads me to my next question…

We are using shared mem which is mapped without the PROT_NOCACHE flags to
store data accessed by a PCI card and its driver. When using the
PROT_NOCACHE, everything seems to work fine. BUT, we tend to thing that we
“may” get performance gains by using the cache.

When we don’t use the NOCACHE flag, there are many failures that probably
come from a lack of sychronisation between the device and the software. From
the doc, we interpreted the msync function to be the one to use for such
synchronisation, it that right or is there another way? When we use it, it
works in many occasions but in some cases, it still fails. Is it just
because by calling msync we introduce a delay that gives some time for the
OS to sync the data ?

Thanks again !
ml


David Donohoe <ddonohoe@qnx.com> wrote in message
news:bmpg92$nl5$1@nntp.qnx.com

Marc Labbe <> marc.labbe@mindready.com> > wrote:
Hi,

Is there any service/function to find out from an application if the
cache
of a system is safe or coherent ? Is QNX always cache safe ?

On some systems (x86) it’s always coherent, on others it’s not. The
information about the cache is stored in the syspage. Here’s a
function that determines if there’s a non-snooped data cache in the
system:

#include <sys/syspage.h

static int MemSmartCache()
{
struct cacheattr_entry *cache_base;
struct cacheattr_entry *cache;
int cache_idx;

cache_base = SYSPAGE_ENTRY(cacheattr);
for (cache_idx = SYSPAGE_ENTRY(cpuinfo)->data_cache;
cache_idx != CACHE_LIST_END;
cache_idx = cache->next) {

cache = &cache_base[cache_idx];
if (!(cache->flags & CACHE_FLAG_SNOOPED)) {
/* Found a cache that isn’t smart */
return 0;
}
}

return 1;
}

Cheers,
Dave

Marc Labbe <marc.labbe@mindready.com> wrote:

Hi Marc,

Hi Dave,

I’m working on a SH4 (SystemH-Amanda system running QNX 6.2.1 patch B with
the latest SystemH-Amanda BSP), I posted here because I thought it was more
OS related than only SH4. I knew that the system had non-coherent cache, but
this function will help configure the software to be able to use the cache
(or not). which leads me to my next question…

We are using shared mem which is mapped without the PROT_NOCACHE flags to
store data accessed by a PCI card and its driver. When using the
PROT_NOCACHE, everything seems to work fine. BUT, we tend to thing that we
“may” get performance gains by using the cache.

When we don’t use the NOCACHE flag, there are many failures that probably
come from a lack of sychronisation between the device and the software. From
the doc, we interpreted the msync function to be the one to use for such
synchronisation, it that right or is there another way? When we use it, it
works in many occasions but in some cases, it still fails. Is it just
because by calling msync we introduce a delay that gives some time for the
OS to sync the data ?

What you’ll need to do is add some code to do explicit cache
synchronisation, then you should be able to use cacheable buffers.

We’re working on platform independent mechanism to do this, which
should be ready for the next release (6.3?). In the meantime,
you can implement this in an SH4-specific way, by using the routines
in /usr/include/sh/inline.h, specifically dcache_invalidate and
dcache_flush.

Before allowing a device to read from a cachable area of memory,
you need to flush the data from the cache, with dcache_flush.
Similarly, when the CPU reads memory that has been modified by
a device, it needs to invalidate the memory from the cache,
using dcache_invalidate, to avoid getting stale data.
Typically the invalidate operation is performed before the
buffer is handed off to the device for a device write
transaction.

Be careful with dcache_invalidate though! You need to make
sure you pad the start and end of the buffers to a cache line
boundary, to avoid corrupting data that could reside in the
same cacheline as the buffer being used with the PCI device.

Good luck :slight_smile:

Dave

David Donohoe <> ddonohoe@qnx.com> > wrote in message
news:bmpg92$nl5$> 1@nntp.qnx.com> …
Marc Labbe <> marc.labbe@mindready.com> > wrote:
Hi,

Is there any service/function to find out from an application if the
cache
of a system is safe or coherent ? Is QNX always cache safe ?

On some systems (x86) it’s always coherent, on others it’s not. The
information about the cache is stored in the syspage. Here’s a
function that determines if there’s a non-snooped data cache in the
system:

#include <sys/syspage.h

static int MemSmartCache()
{
struct cacheattr_entry *cache_base;
struct cacheattr_entry *cache;
int cache_idx;

cache_base = SYSPAGE_ENTRY(cacheattr);
for (cache_idx = SYSPAGE_ENTRY(cpuinfo)->data_cache;
cache_idx != CACHE_LIST_END;
cache_idx = cache->next) {

cache = &cache_base[cache_idx];
if (!(cache->flags & CACHE_FLAG_SNOOPED)) {
/* Found a cache that isn’t smart */
return 0;
}
}

return 1;
}

Cheers,
Dave

Marc Labbe <marc.labbe@mindready.com> wrote in message
news:bmpltd$oc9$1@inn.qnx.com

We are using shared mem which is mapped without the PROT_NOCACHE flags to
store data accessed by a PCI card and its driver. When using the
PROT_NOCACHE, everything seems to work fine. BUT, we tend to thing that we
“may” get performance gains by using the cache.

When we don’t use the NOCACHE flag, there are many failures that probably
come from a lack of sychronisation between the device and the software.
From
the doc, we interpreted the msync function to be the one to use for such
synchronisation, it that right or is there another way? When we use it, it
works in many occasions but in some cases, it still fails. Is it just
because by calling msync we introduce a delay that gives some time for the
OS to sync the data ?

Does the device also write into this shared memory storage - if so, msync()
won’t help, since the device could commit to memory, and then the cache
could do a write-back on top of updated information. If the device is doing
readonly operations on the mem, then you need to find a way of holding off
the hardware just before updating the memory and then allowing the hardware
to continue when the cache information has been commited to memory (when
msync() returns).

Also note, that calling msync() isn’t a light weight call (it’s about 2
kernel calls (message pass + cache flush call), plus the cost of flushing
the cache), not to mention the collateral damage you do to other tasks else
since they all now have cold cache.

-Adam

Hi Adam,

Does the device also write into this shared memory storage - if so,
msync()
won’t help, since the device could commit to memory, and then the cache
could do a write-back on top of updated information.
The device does write back in the memory storage. Isn’t it what

MS_INVALIDATE is used for with msync ?

Also note, that calling msync() isn’t a light weight call (it’s about 2
kernel calls (message pass + cache flush call), plus the cost of flushing
the cache), not to mention the collateral damage you do to other tasks
else
since they all now have cold cache.
We expected it, but what’s more costly between this and using PROT_NOCACHE

when mapping the memory ? We should use the SH4 specific macros specified by
Dave in the previous answer anyway (or wait for a platform independent
version of those macros), but still, it has to be worth the effort !

Marc

Marc Labbe <marc.labbe@mindready.com> wrote in message
news:bn0oe7$j33$1@inn.qnx.com

Hi Adam,

Also note, that calling msync() isn’t a light weight call (it’s about 2
kernel calls (message pass + cache flush call), plus the cost of
flushing
the cache), not to mention the collateral damage you do to other tasks
else
since they all now have cold cache.
We expected it, but what’s more costly between this and using PROT_NOCACHE
when mapping the memory ? We should use the SH4 specific macros specified
by
Dave in the previous answer anyway (or wait for a platform independent
version of those macros), but still, it has to be worth the effort !

msync() is your last resort :slight_smile: Which way is costly may also depends on the
data pattern. How large is the data block? How often you need to access it
before pass it to hardware?

If you don’t access the data after it in place, use PROT_NOCACHE doesn’t
that
hurt. If the data block isn’t that bigger, you also have the choice put data
in cached
memory, and instead of msync(), copy it into a non-cache memory before
passing
to the hardware.

-xtang