“Igor Kovalenko” <kovalenko@attbi.com> wrote in message
news:3DA39888.8000807@attbi.com…
Got me curious. Don’t you have also to flush cached pages before and
after DMA transfers,
RAM used for DMA transfer(s) and dual-port RAM are usually not cached.
mmap’s PROT_NOCACHE
option tells the kernel to avoid caching for such memory regions. Also a
kernel is not aware when a
DMA tranfer occurs, such transfer is device-memory interaction that happens
w/o CPU.
Saying “usually not cached” I meant such architecture where a device
suddenly initiates DMA transfer and
notifies CPU with an interrupt on DMA transfer completion. An Ethernet chip
is the best example. Sure, there is
an exception that proofs the rule Let’s imagine “best-effort” image
recognition system that we are building
with a pci board that could DMA-transfer a huge chunk of data by request.
Our CPU, Broadcom BCM1250
for sake of conversation is fast enough to crunch this data in acceptable
time only if the data is cached.
In this case we would (1) keep DMA region cached, (2) explicitly initiate
DMA-transfers by requesting the pci chip,
and (3) we would also invalidate/flush cache when this is needed. Hmm,
what’s my point? My point is there is
no kernel awareness about our devices and our cache strategies.
plus when several processes have a page mapped into
addresses that don’t fall into the same cache line? And things like
mprotect()?
Very interesting question. On x86 and PPC it might be not a problem. But for
Broadcom’s BCM1250 it could be
a “problem” due to virtual indexes/tags those Brian told us about. It is
crystal clear that instruction cache has to be
flushed on every context switch. It is mud clear to me if data cache has to
be flushed too. Brian could you, please,
make BCM1250 data cache story ‘crystal clear’?
I might be confused here, since I only know this stuff from books not
from experience …
Brian Stecher wrote:
Rennie Allen <> rallen@csical.com> > wrote:
So (simple curiosity at work), since you have now implemented this, what
is the damage to the context switches (vis-a-vis the uniprocessor with
physically tagged cache) ?
I haven’t done any benchmarking to see the actual numbers but,
empirically,
it doesn’t seem to hurt too much. It’s a fast chip.