PCI Memory Read is really slow!


The idea behind this program is to simply access the ram and download the data from it to a txt file.
Later Ill convert the txt file to jpeg and hopefully it will be readable .
However when I try and read from the RAM using NEW[] it takes waaaaaay to long to actually copy all the values into the file?
Isnt it suppose to be really fast? I mean I save pictures everyday and it doesn’t even take a second?
Is there some other method I can use to dump memory to a file?

#include <stdio.h>
#include <stdlib.h>
#include <hw/pci.h>
#include <hw/inout.h>
#include <sys/mman.h>

	FILE *fp;
	fp = fopen ("test.txt","w+d");
	int NumberOfPciCards = 3;
	struct pci_dev_info info[NumberOfPciCards];
	void *PciDeviceHandler1,*PciDeviceHandler2,*PciDeviceHandler3;
	uint32_t *Buffer;
	int *BusNumb;	//int Buffer;
	uint32_t counter =0;
	int i;
	int r;
	int y;
	volatile uint32_t *NEW,*NEW2;
	uintptr_t iobase;
	volatile uint32_t *regbase;

	NEW = (uint32_t *)malloc(sizeof(uint32_t));
	NEW2 = (uint32_t *)malloc(sizeof(uint32_t));
	Buffer = (uint32_t *)malloc(sizeof(uint32_t));
	BusNumb = (int*)malloc(sizeof(int));
	printf ("\n 1");

	for (r=0;r<NumberOfPciCards;r++)
		memset(&info[r], 0, sizeof(info[r]));

	printf ("\n 2");

        //Here the attach takes place.
	for (r=0;r<NumberOfPciCards;r++)
		(pci_attach(r) < 0) ? FuncPrint(1,r) : FuncPrint(0,r);

	printf ("\n 3");

	info[0].VendorId = 0x8086;  //Wont be using this one
	info[0].DeviceId = 0x3582;   //Or this one
	info[1].VendorId = 0x10B5;   //WIll only be using this one PLX 9054 chip
	info[1].DeviceId = 0x9054;   //Also PLX 9054
	info[2].VendorId = 0x8086;   //Not used
	info[2].DeviceId = 0x24cb;    //Not used

	printf ("\n 4");
        //I attached the device and give it a handler and set some setting.
	if ((PciDeviceHandler1 = pci_attach_device(0,PCI_SHARE|PCI_INIT_ALL, 0, &info[1])) == 0)
					perror("pci_attach_device fail");

			for (i = 0; i < 6; i++)
			//This just prints out some details of the card.	
				if (info[1].BaseAddressSize[i] > 0)
					printf("Aperture %d: "
					"Base 0x%llx Length %d bytes Type %s\n", i,
					PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? PCI_MEM_ADDR(info[1].CpuBaseAddress[i]) :	PCI_IO_ADDR(info[1].CpuBaseAddress[i]),
							info[1].BaseAddressSize[i],PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? "MEM" : "IO");
			printf("\nEnd of Device random info dump---\n");

	printf("\nNEWs Address : %d\n",*(int*)NEW);
        //Not sure if this is a legitimate way of memory allocation but I cant see to read the ram any other way.
	NEW = mmap_device_memory(NULL, info[1].BaseAddressSize[3],PROT_READ|PROT_WRITE|PROT_NOCACHE, 0,info[1].CpuBaseAddress[3]);
                 //Here is where things are starting to get messy and REALLY long to just run through all the ram and dump it.
                 //Is there some other way I can dump the data in the ram into a file?
		while (counter!=info[1].BaseAddressSize[3])

			fprintf(fp, "%x",NEW[counter]);




The reason it’s taking so long is because you are copying the data to the file in a loop. So you are literally writing to the file probably hundreds or thousands of times. That’s going to be VERY slow since the file system has to make sure each fprintf is successfully written to disk. Think of this way as 100 people walking into a house. The way you are doing that is each person opens the front door, enters the house then closes the door behind them. Then the next person does the same. That opening and closing of the door takes time.

I suggest you write it all at once using fwrite(); That would be the equivalent of opening the front door once and having 100 people walk in and then closing the door.


P.S. When you are talking about saving pictures everyday if you mean on a cell phone or camera those may be using faster physical media (compact flash/SSD) vs a hard drive (assuming you are using a hard drive and not a compact flash/SSD card).

Hi Tim

Thanks for the quick response.
Your little explanation there was awesome by the way !
Ok I have tried fwrite() which sounds in all cases like the better idea… but now im really not getting the data that I wanted…
Im getting trash data out of the ram…Ill post a little sample in CODE brackets.

[code] @ Py@À ùô|¨|رð± P0 Ô| 4° ô| *ˆ*ˆ ì|ô|D} 4° ”} ž}ª}á}u~ƒ~¬~½~Ç~Û~õ~2=JUpƒ 4€ ¨‡ a 0° À/ œ ./OpenPCI _=./OpenPCI SSH_CONNECTION= 56295 22 PATH=/sbin:/usr/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/bin:/opt/sbin:/usr/pkg/bin:/usr/local/bin:/usr/local/sbin:/usr/sbin:/opt/bin SHELL=/bin/sh LD_LIBRARY_PATH=:/usr/local/lib:/opt/lib HOSTNAME=DCAMDEV USER=root TMPDIR=/truvelo/tmp MAIL=/var/spool/mail/root PROCESSOR=x86 HOME=/root SSH_CLIENT= 56295 22 QNX_HOST=/ QNX_TARGET=/ TERM=xterm QNX_CONFIGURATION=/etc/qnx SSH_TTY=/dev/ttyp0 SYSNAME=nto LOGNAME=root /develop/RandomToetse/src/./OpenPCI p 2° óïd© - - €üa ELF ¨‡4 D 4 ( 4 4€4€À À ô ô€ô€ € €$ $ $ $¨$¨@ d 8 8¨8¨° °  /usr/lib/ldqnx.so.2 QNX a e & 2 d© 8 G N S " b i x Š  ö ‡ ­ ¸ à d© ñÿÉ Ð P‰ Õ Ü è ï ý d© ñÿ ˆ© ñÿê - A libc.so.3 pci_read_config_bus printf _init_array errno _preinit_array perror puts __cxa_finalize malloc mmap_device_io pci_attach_device mmap_device_memory pci_find_device pci_attach _init_libc fopen memset main fclose _fini_array atexit fwrite _btext _edata __bss_start _end __deregister_frame_info _Jv_RegisterClasses __register_frame_info T© X© \© © d© ø¨a ü¨a ©a ©a ©a ©aa ©a ©a ©a
©a ©a $©a
(©a ,©a 0©a 4©a 8©a <©a @©a D©a H©a L©a P©a èÛ è&  ÿ5ð¨ÿ%ô¨ ÿ%ø¨h éàÿÿÿÿ%ü¨h éÐÿÿÿÿ% ©h éÀÿÿÿÿ%©h é°ÿÿÿÿ%©h é ÿÿÿÿ%©h( éÿÿÿÿ%©h0 é€ÿÿÿÿ%©h8 épÿÿÿÿ%©h@ éÿÿÿÿ%©hH éPÿÿÿÿ% ©hP é@ÿÿÿÿ%$©hX é0ÿÿÿÿ%(©h é ÿÿÿÿ%,©hh éÿÿÿÿ%0©hp é ÿÿÿÿ%4©hx éðþÿÿÿ%8©h€ éàþÿÿÿ%<©hˆ éÐþÿÿÿ%@©h éÀþÿÿÿ%D©h˜ é°þÿÿÿ%H©h é þÿÿÿ%L©h¨ éþÿÿÿ%P©h° é€þÿÿNIAMP‰‰Ó‹4$L$±xPƒx t ‹ƒÂ…Àu÷SRWQVè=ÿÿÿh$¨h$¨è~þÿÿƒÄh€”èqÿÿÿƒÄh$¨h$¨èOÿÿÿƒÄèaþÿÿh$¨h$¨è8þÿÿƒÄÇd© è& PèPÿÿÿ̐U‰åVSè [Ã¢ €»| …} ‹‹h …Étƒì‹“üÿÿÿRèþÿÿƒÄ‹ƒ€ ³Dÿÿÿ“@ÿÿÿ)ÖÁþN9ðs"v ¼’ @‰ƒ€ ÿ”ƒ@ÿÿÿ‹ƒ€ 9ðr苃l …Àtƒìƒ4ïÿÿPèÿýÿÿƒÄƃ| eø[^]Í´& ¼’ U‰åSƒìè [Ãð ‹“t …Òtƒìƒ„ Pƒ4ïÿÿPèmþÿÿƒÄ‹ƒHÿÿÿ…Àt‹ƒp …Àtƒì“HÿÿÿRÿЃċ]üÉАL$ƒäðÿqüU‰åWVSQì¨ ‰à‰E”ÇD$ˆ”Ç$Œ”è ýÿÿ‰E°ÇE´ ‹}´Gÿ‰E¨‰øº ‰Ã€çÿ‰Öƒæ‰Ø‰òiÚ€a kÈ Ë¹€a ÷áÓ‰Ú‰Á€åÿ‰M€‰Óƒã‰]„‹E€‹U„‰øº ‰Æƒæÿ‰µxÿÿÿ‰Ñƒá‰|ÿÿÿ‹…xÿÿÿ‹•|ÿÿÿiÚ€a kÈ Ë¹€a ÷áӉډÀçÿ‰pÿÿÿ‰Öƒæ‰µtÿÿÿ‹…pÿÿÿ‹•tÿÿÿ‰øÁà‰ÂÁâ‰Ñ)Á‰ÈƒÀƒÀÁèÁà)čD$ƒÀÁèÁà‰E¬ÇEÌ Ç$ èüÿÿ‰E¤Ç$ èeüÿÿ‰EÜÇ$ èüÿÿ‰EÄÇ$ èýûÿÿ‰EÈÇ$•”èŽûÿÿÇEÔ ë3‹U¬‹EÔÁà‰ÁÁá‰Ë)É؍ÇD$ð ÇD$ ‰$èFüÿÿƒEÔ‹EÔ;E´|ÅÇ$™”è>ûÿÿÇEÔ ë;‹EÔ‰$èêûÿÿ…Ày‹EÔ‰D$Ç$ è5a ë‹EÔ‰D$Ç$ è a ƒEÔ‹EÔ;E´|½Ç$”èæúÿÿ‹E¬fÇ@†€‹E¬fÇ ‚5‹E¬fÇ€ò µ‹E¬fÇ€ð T‹E¬fǀ⠆€‹E¬fÇ€à Ë$Ç$¡”è™úÿÿ‹E¬ð ‰D$ÇD$ ÇD$ ý Ç$ èñúÿÿ‰E¸ƒ}¸ uÇ$¥”èŒúÿÿÇ$ èûÿÿÇEÐ éÓ ‹UЋE¬ƒÂ‹D…À„º ‹UЋE¬ƒÂ*TЋ‹Rƒà…Àua¹¼”ë¹À”‹UЋE¬ƒÂ‹\‹EЋU¬ƒÀT‹‹Rƒà…Àu$‹EЋU¬ƒÀT‹‹R‰Æƒæð‰uˆ‰Öƒæÿ‰uŒë"‹EЋU¬ƒÀT‹‹R‰Æƒæü‰uˆ‰Öƒæÿ‰uŒ‰L$‰$‹Eˆ‹UŒ‰D$‰T$‹EЉD$Ç$Ä”èuùÿÿƒEЃ}ÐŽ#ÿÿÿÇ$ø”è›ùÿÿ‹EÄÇ ÇEÔ é“ ‹EÔ‹UĉT$ÇD$ ÇD$ ‰D$ÇD$ Ç$ èùÿÿ‰EЃ}Ð u’‹EÄ‹ ‰D$‹EÔ‰D$Ç$e•èïøÿÿ‹EЉ$è~ ‹EÈ‹UĉT$‰D$ÇD$ ÇD$†€ Ç$N$ èjùÿÿ‰E؃EÔƒ}Ô;Žcÿÿÿ‹EÄÇ Ç$,•èÔøÿÿ‹EÄ‹ ‰D$Ç$W•èøÿÿ‹EĉD$ÇD$ ÇD$ ÇD$$ ÇD$ Ç$ è<øÿÿ‰EЋEЉ$èØ ‹EÄ‹ ‰D$Ç$b•è)øÿÿ‹EÄÇ ‹EЉ$è¯ ‹EĉD$ÇD$ ÇD$ ÇD$$ ÇD$ Ç$ èÒ÷ÿÿ‰EЋEЉ$èn ‹EÄ‹ ‰D$Ç$W•è¿÷ÿÿÇ$,•èó÷ÿÿ‹E¬‹Ì ‹€È ‰D$‰T$Ç$u•è÷ÿÿ‹E¬‹Ô ‹€Ð ‰D$‰T$Ç$•èm÷ÿÿ‹E¬‹€ˆ ‰D$Ç$¬•èT÷ÿÿ‹E¬‹\ ‹€X ‰D$‰T$Ç$Еè1÷ÿÿ‹E¬‹€Œ ‰D$Ç$ô•è÷ÿÿ‹E¬‹d ‹€ ‰D$‰T$Ç$–èõöÿÿ‹E¬‹€ ‰D$Ç$<–èÜöÿÿ‹E¬‹l ‹€h ‰D$‰T$Ç$–è¹öÿÿ‹E¬‹€” ‰D$Ç$„–è öÿÿ‹E¬‹t ‹€p ‰D$‰T$Ç$¨–è}öÿÿ‹E¬‹€˜ ‰D$Ç$Ì–èdöÿÿ‹E¬‹| ‹€x ‰D$‰T$Ç$ð–èAöÿÿ‹E¤‹ ‰D$Ç$—è,öÿÿ‹E¬‹t ‹€p ‹M¬‹‰” ‰D$‰T$ÇD$ ÇD$ ‰L$Ç$ èŒöÿÿ‰E¤‹E¬‹| ‹€x ‹M¬‹‰Œ ‰D$‰T$ÇD$ ÇD$ ‰L$Ç$ èIöÿÿ‰EÜ‹E¬‹d ‹€ ‹M¬‹‰Œ ‰D$‰T$‰$èîõÿÿ‰Eà‹EÄÇ îîîî‹EĉD$ÇD$ ÇD$ ÇD$$ ÇD$ Ç$ è/õÿÿ‰EÔ‹EÔ‰$èË ‹EÄ‹ ‰D$Ç$%—èõÿÿ‹EÄÇ îîî‹EÄÇ ðððð‹EĉD$ÇD$ ÇD$ ÇD$$ ÇD$ Ç$ èÇôÿÿ‰EÔ‹EÔ‰$èc ‹EÄ‹ ‰D$Ç$%—è´ôÿÿÇ$,•èèôÿÿ‹E¬‹€Œ ‰D$Ç$<—èôÿÿ‹E¬‹| ‹€x ‰D$‰T$Ç$E—èlôÿÿÇEÌ ë&‹E°‰D$ÇD$ ÇD$ E¤‰$èqõÿÿƒÉ}Ì € uÑ‹E°‰$è)õÿÿ‹E¤ƒÀ‹ ‰D$Ç$O—èôÿÿÇ$U—èEôÿÿ‹EÄ‹ ‰D$Ç$Y—èðóÿÿ‹e”eðƒÄ Y[^_]aüÃU‰åƒìƒ}uÇ$^—èöóÿÿÇ$ èúôÿÿƒ} u‹E‰D$Ç$x—è¡óÿÿÉÃU‰åWVSƒì<·E·À‰D$Ç$˜—èóÿÿE‰D$ÇD$ ÇD$ ý Ç$ èÜóÿÿ‰Eƒ} uÇ$¥”èwóÿÿÇ$ è{ôÿÿÇEä é« ‹EäƒÀ$‹D……À„• ‹EäƒÀ‹TÅ‹DŃà…Àua¹¼”ë¹À”‹EäƒÀ$‹|…‹EäƒÀ‹TÅ‹DŃà…Àu‹EäƒÀ‹TÅ‹DʼnÃãð‰Öƒæÿë‹EäƒÀ‹TÅ‹DʼnÃãü‰Öƒæÿ‰L$‰|$‰\$‰t$‹Eä‰D$Ç$Ä”èˆòÿÿƒEäƒ}äŽKÿÿÿÇ$ø”è®òÿÿƒÄ<[^_]ÃU‰åƒìƒ} uÇ$Ï—èŽòÿÿëA}‡ uÇ$Ü—èwòÿÿë*}‰ uÇ$ô—èòÿÿëÇE† Ç$ ˜èKòÿÿÉАU‰åVSè [Ã¢ ‹ƒ8ÿÿÿƒøÿt ³8ÿÿÿ´& ¼’ ÿЃøÿuô[^]Ãè»óÿÿ w+d test.txt
4 pci_attach_device fail MEM IO Aperture %d: Base 0x%llx Length %d bytes Type %s

End of Device random info dump—
0x%x : 0x%x–



NEWs Address : %d

7 0x%x pci_attach_device fail
pci_attach_device(%i) Success

Beginning of Device random info dump for ’ 0x%x '—
€”  ؃ è

SOME of this is even my code that ive just executed WHAT is going on?

Well first I have to disagree with Tim. If you did an fopen() and an fclose() between each fprintf()I would agree with the open/close door analogy.
When you are writing to a file using fprintf() you are using buffered writes. The data accumulates in a buffer in your programs address space before it even gets sent to the file system. When it hits the file system, unless you have changed the default settings, it goes into a memory cache where it can age a bit. So even there you can accumulate multiple dirty sectors that need to be flushed to disk. Finally they are all flushed to disk in block writes.

So why is it slow? I don’t know. How slow, and how much memory are you moving?

As to your weird output, I don’t see the new code. One possibility is that the info[1].BaseAddressSize[3] is not the physical address of your device, but some other place in memory. Another possibility is that your code is looking in the wrong place.

Reading over the PCI bus is a very slow process. I don’t recall the exact speed but it’s in the LOW megahertz I thing. Around 8Meg or something like that, FAR slower then normal RAM. PCI is efficient when operation are DMA driven. Plus PCI memory, unlike normal memory is always mapped as NONCACHE, which again has a drastic impact.

You might get better performance if you use memcpy, into a buffer in ram instead of reading each value one by one. Then do one signle fprintf of that buffer. If the memory you are trying to read is really big, just breakdown the memcpy into smaller chunks.


#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <iostream.h>

using namespace std;
int main(void)
	FILE *fp;
	char *ptr;
	fp = fopen("tmp", "w");

	ptr = (char*) malloc(1000000);
	memset(ptr, 0, 1000000);
	for (int i=0; i<1000000; i++)
		fprintf(fp, "%c", ptr[i]);
	return 0;

time fprintf:
0.31s real 0.20s user 0.00 system


#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <iostream.h>

using namespace std;

int main(void)
	FILE *fp;
	char *ptr;
	fp = fopen("tmp1", "w");

	ptr = (char*) malloc(1000000);
	memset(ptr, 0, 1000000);

	fwrite(ptr, sizeof(char), 1000000, fp);

	return 0;

time fwrite:
0.05s real 0.01s user 0.00 system

I’d say there is a door opening/closing with fprintf :laughing: Although in reality it’s imperceptibly small and not the cause of his slowness since my example wrote 1 Meg in under 1 second in either case. I also agree with your detailed explanation of how things work but there whether it’s fprintf library or something else it’s definitely vastly slower to use fprintf than fwrite.


P.S. From the doc’s on mmap_device_memory: You need I/O privileges to use the result of the mmap_device_memory() function. The calling thread may call ThreadCtl() with the _NTO_TCTL_IO command to establish these privileges.

I don’t see a call to ThreadCtl() in your code. You need this even if you are running as root.

I guess it depends on how you look at it. In the example below you are measuring just the process time. There’s no accounting for the time the file system putting the data to disk. That can easily all happen after the program has exited.

I do agree that the cpu time to do all those printf’s is much more (apparently 20 times more) than one fwrite(). It’s just that that’s not what takes the most time.


Out of curiousity, try fputc( 0, fp ); What numbers are you getting ?

time fputc:
0.14s real 0.08s user 0.00 system

Much faster than fprintf but not quite as fast as fwrite.

Most interesting to me is that repeated runs (say 10 in a row) give VERY smooth times for frwrite. The numbers vary from 0.04 to 0.06 real. But they vary wildly for fprintf (.25-.36 which I averaged to .31) and fputc (.10-.21) which I averaged to .14.

Not quite sure why that is.


I would guess the fprintf is much slower because of the parse done to the format string “%c” and then to the conversion from bin to ascii. Fputc has no parsing and no conversion. So the difference in time should be striclty cause by extra overhead of calling fputc 1000000 times versus 1 fwrite. I think that the fwrite will be split down to 1000000/buffer size operations. I think buffer size is 4k or 8k not sur. Making the buffer bigger with setvbuf would help. Using open/write/close would be much faster yet.

We are digressing from the OP ;-)

Can’t explain the variation of time either. If I had to guess I would have said they should have similar behavior, as the buffering should ensure similar number of messages and context switches. The System profiler would probably help explain it ;-)