How to increase speed of copying memory?

I try to copy one memory block to another,using memcpy(&a,&b,size),the cost time is more than 3ms,but the expected cost time is below 1.5ms,how to do? by the way,I use pci bus to read data,first I use mmap() map one memory block,second the hardware send a interrupt,during the thread for the interrupt I do the work copying memory,but the shortest interrupt delay time is about 2.5ms,so sometimes the data is losted.

If you are getting data from the PCI bus, is that data being passed async on the bus or is it properly set up for DMA. There’s a HUGE difference in PCI bus performance if you haven’t got the DMA set up. Copying across the PCI bus means you should be looking at the actual PCI interface chipset if you are time critical. Also a consideration, what else is going on on the PCI bus at the instant. Your post implies that it doesn’t always fail,. sometimes it is fast enough. That leads me to infer that there is another thread of execution playing on the PCI bus.


how to setup DMA?thanks

As bjchip said, you need to make sure you have no other bus activity which can interfere. A video card (which is a bus master) trying to update the screen at the same time you are trying to service an ISR can easily cause unexpected latency. This is the primary reason why an device with hard real time requirements should not be running a gui.

Take a look at the flags you can use with mmap() (in the helpviewer) for ways to control how the memory is mapped to facilitate DMA access.

yes,a video card share the same interrupt with my dsp card,should I stop interrupt when I service an ISR? please tell me how to solve it.

If you have a hard realtime requirement, get rid of the video card - or any other card which is a bus master. Once the hardware controls the bus (as the video card will do), there is nothing the software can do with the bus until it is released by the hardware.

I have to show something to the customer and have one GUI programme

My 2 cents,

Make sure that a and b are quad aligned., going memcpy on unaligned data can be clostly, and worst over PCI bus since it’s not cached and 32 bit wide.

Other possiblility is to write a version of memcpy that uses MMX instructions. sources in assembler are available on Intel and AMD site if my memory servers me right, but aren’t written for gnu asm.

How do you set up DMA?

This is generally done at the driver level for the specific card that is on the PCI bus.

Your video card doesn’t HAVE to have an interrupt to deal with video AFAIK. Its absence changes the performance of the card mostly with respect to power saving. That can usually be set up in the bios for most x86 boards.

At the driver level you have to choose a block size and map CONTIGUOUS real memory to it. I am new to QNX myself so I am not sure how to do this in detail in this system, but I expect to know in another week or so :-). You transmit the address of that memory to the card on the PCI bus and tell it how you want the data sent. That’s a “push” process. There is also a “pull” method in which you control the PCI card from the driver. Generally all PCI chipsets have a set of registers, a FIFO buffer or some other mechanism for handling the low level memory transfer. You can get an interrupt from the PCI card when it has transferred data into the QNX driver memory (now it is across the bus and internal memcp speeds apply) and process it out at that time.

This means you have to get down and dirty with the PCI chipset registers for your specific card and with the driver for that card. It isn’t pretty if you have to do it for the first time.

That’s about the limit of what I can provide. The QNX masters have to fill in the details :slight_smile:


BJ - how are you calling mmap()? Can you post a code snippet of what you are doing to setup the memory? I suspect you are using PROT_NOCACHE on the mapping.

I haven’t called mmap yet. I got my copy of qnx a week ago :slight_smile:

Guess cdm mixed up you and xuyong (original poster of the thread) :slight_smile:

Yep - my bad. :slight_smile: that was for the original poster.

yes,I use PROT_NOCACHE,if I should disable the PROT_NOCACHE

I try to not use PROT_NOCACHE in mmap() and write the cache line in pci config register(original it is 0),but the time costed is same as before.

What are you basing your expected times on? Theoretical PCI bus speed or known device speed.

somebody tell me that the tranfer time on pci bus is very fast,for 9600 byte data,3ms can’t be accepted,something must be wrong in my programme.

Theoretically an unloaded PCI bus should give him about 160 uSec for that sized block, but that ONLY applies to block DMA access (assuming about 60 MBytes/Sec not the 80 quoted in the spec). For async transfer, which is what happens by default, 3.2 mSec is an approximate time. Sounds VERY like someone has to do DMA transfers.


That shows that when you ask for help give as much information as possible as to what you are doing or trying to do. Everybody here would have been able to inform you that 3ms and 9600 bytes don’t make sense ;-)

I want to use dma transfer to improve it,but how to use dma and what’t the principle under qnx,sorry,
I know very little about it,please tell me the step.