This memory is filled via DMA from a video grabber,
and the customer used the memory directly for calculations.
However, accessing this memory is 10 times slower than allocating
a malloc’ed memory.
By copying the DMA’ed data to a malloc’ed block
and then using that for the calculation reduces the
calculation time by 90%!
Why is it so much slower?
Is MAP_PHYS needed for DMA?
Thanks
This memory is filled via DMA from a video grabber,
and the customer used the memory directly for calculations.
However, accessing this memory is 10 times slower than allocating
a malloc’ed memory.
By copying the DMA’ed data to a malloc’ed block
and then using that for the calculation reduces the
calculation time by 90%!
Why is it so much slower?
Is MAP_PHYS needed for DMA?
Thanks
Turning off caching is required to be able to “see” the DMA results - right?
So, it seems the solution is to have a second cached buffer used
to do the calculations.
/Kirk
acellarius@yahoo.com > wrote:
Standard support is taking a bit long to answer this one,
so I thought I’d try here so long:
Application is allocating memory for DMA access as follows:
This memory is filled via DMA from a video grabber,
and the customer used the memory directly for calculations.
However, accessing this memory is 10 times slower than allocating
a malloc’ed memory.
By copying the DMA’ed data to a malloc’ed block
and then using that for the calculation reduces the
calculation time by 90%!
Why is it so much slower?
Is MAP_PHYS needed for DMA?
Thanks
PROT_NOCACHE turns of caching.
Turning off caching is required to be able to “see” the DMA
results - right?
Depends on the underlying hardware/processor/MMU/etc. If you have a
“smart cache” or a “bus snooping” cache then you don’t necessarily need
to. You can examine the SYSPAGE(cacheattr) for the CACHE_FLAG_SNOOPED
attribute to determine this …
So, it seems the solution is to have a second cached buffer used
to do the calculations.
This is certainly the safest thing to do (although not always necessary).
Turning off caching is required to be able to “see” the DMA results - right?
So, it seems the solution is to have a second cached buffer used
to do the calculations.
Thanks guys!
Sean-can you please confirm?
Also, is it necessary to have MAP_PHYS?
Will it have any effect on the access times?
Turning off caching is required to be able to “see” the DMA results - right?
So, it seems the solution is to have a second cached buffer used
to do the calculations.
Thanks guys!
Sean-can you please confirm?
As jgarvey said, platforms with a “snooped” cache don’t
necessarily need PROT_NOCACHE. Even on platforms without
snooping, there may be instructions to invalidate and / or
prefetch the cache after the DMA operation.
Kirk Bailey <> kirk.a.bailey@delphi.com> > wrote:
PROT_NOCACHE turns of caching.
Turning off caching is required to be able to “see” the DMA
results - right?
Depends on the underlying hardware/processor/MMU/etc. If you have a
“smart cache” or a “bus snooping” cache then you don’t necessarily need
to. You can examine the SYSPAGE(cacheattr) for the CACHE_FLAG_SNOOPED
attribute to determine this …
Turning off caching is required to be able to “see” the DMA results -
right?
So, it seems the solution is to have a second cached buffer used
to do the calculations.
Can’t you just change the attributes of the memory block after the DMA
transfer and before the calculations, so that the memory can be cached,
using mprotect()?
Paolo
acellarius@yahoo.com > wrote:
Standard support is taking a bit long to answer this one,
so I thought I’d try here so long:
Application is allocating memory for DMA access as follows:
This memory is filled via DMA from a video grabber,
and the customer used the memory directly for calculations.
However, accessing this memory is 10 times slower than allocating
a malloc’ed memory.
By copying the DMA’ed data to a malloc’ed block
and then using that for the calculation reduces the
calculation time by 90%!
Why is it so much slower?
Is MAP_PHYS needed for DMA?
Thanks
Can’t you just change the attributes of the memory block after the DMA
transfer and before the calculations, so that the memory can be cached,
using mprotect()?
The call would be msync(), but on a microkernel the overheads would
probably be too high to continuously flush this at each operation.
Depending on the population vs calculation ratio either PROT_NOCACHE
or use the bounce-buffer approach (perhaps in conjunction with the
snooping cache detection) … FYI, this is what the filesystem does,
the buffer cache is cacheable on the grounds that you’ll likely refer
to a cached disk block multiple times in a well-tuned system, and the
emphasis is on the disk driver to bounce via a temporary NOCACHE buffer
if it detects that the MMU is not snooping/invalidating cache lines
after DMA transfers …