DMA bounce buffer on x86 system?

yicheng · January 26, 2008, 12:21am

Hi Folks,

Is there DMA bounce buffer existing in QNX for x86 system? Or a PCI DMA can be transferred to high memory ( > 1G) without bounce buffer involved?

Thanks!

rgallen · January 26, 2008, 5:39pm

High memory above 1GB? I am not sure what you are talking about. QNX imposes no restrictions on where you can DMA, other than the fact it is a 32bit O/S with extended physical addressing. Thus any address supported by the hardware within the limits of the extended physical addressing of the processor in 32 bit mode can be accessed by a driver via a 32bit virtual address.

On x86, as long as the device supports it, you can DMA to/from any target /source address up to 36 bits wide (ie 64GB).

Many devices only have 32bit DMA address registers and cannot DMA above 4GB physical address, but if a device can access a physical address above 4GB drivers in QNX can certainly supply them, and drivers can access them.

I am completely unfamiliar with what a “DMA bounce” buffer is, but they sound rather unpleasant (since the whole purpose of DMA is to eliminate copies and bouncing sounds suspiciously like copying).

yicheng · January 27, 2008, 8:02pm

I read it from linux that on i386 systems the default mapping scheme limits kernel-mode addressability to the first gigabyte of physical memory (low memory), to use DMA to directly access memory above 1GB (high memory), you have to either use bounce buffer or modify linux kernel configuation to use high memory I/O support.

So in QNX, I think as long as a physical memory region can be allocated by “mmap” call, it is DMAable in QNX point of view.

My another question is, is x86-64bit QNX version available?

rgallen · January 27, 2008, 9:42pm

Oh, that’s right drivers reside in kernel space in Linux (yet another inherent performance disadvantage of monolithic kernels

Yes… just remember to use a paddr64_t for the physical address

No, there is no 64bit version of QNX (hence the point about remembering to use a paddr64_t for the physical address).

yicheng · January 29, 2008, 5:41pm

so QNX running on x86-64 platform has no performance advantage over on x86-32 platform?

rgallen · January 30, 2008, 1:57am

I wouldn’t say that. I would say that your process address space will remain restricted to 4GB.

mario · January 30, 2008, 3:00am

I would First of all you cannot compare a x86-32 platform to an x86-64 one because there are no processor that you can compare, I mean there is no P4 with and with out 64 bit support. They are all from different generation of processors.

That being said QNX doesn’t make use of the 64 bits mode. Even if it would it wouldn’t make that big of a different in terms of performance. The key would be in providing good compiler support to efficiently use the extra set of registers x64 provides. I’m not sure gcc is there yet.

rgallen · January 30, 2008, 5:09pm

Here, I’d have to disagree. All modern hardware is 64bit capable, and turning on 64bit mode would increase data transfer capacity, and reduce code execution speed (due to the fact that the larger code will effectively reduce the effectiveness of the I-cache).

Good points.

I think we agree that there are no simple answers to the question of performance advantages of 32bit vs. 64bit, and that it depends on the application.

I also believe that the biggest benefit of 64bit is the increase in process address space (there are applications that are getting close to the limits of a 32bit address space), and not performance benefits (which tend to be relatively modest for most apps).

mario · January 30, 2008, 7:05pm

What do you mean by “increase data transfer capacity”?, increase in address space size?

I’m not sure 64 bit instructions are significantly bigger, pointers are though. From my observation (reading on the web) most application see a benefit of around 10% when running 64 bit application.

maschoen · January 30, 2008, 11:05pm

About that 4Gig limit? I thought I heard that QNX 6.3.2 already supported more than 4Gig, not that many motherboards do. Am I wrong? This might seem like a contradiction if you think of x86 as being a pure 32 bit architecture, but it is not. I don’t think there is any reason why you could have DS pointing to a 4Gig memory block, and ES pointing to another. That’s not to say that the compiler supports this either.

rgallen · January 31, 2008, 4:12am

QNX has supported more than 4GB of physical memory for several versions, but I was referring to process address space. As long as QNX is 32bit, you will only have 4GB of process address space.

When in vm386 mode there is no ES…

rgallen · January 31, 2008, 4:51am

I am referring to the fact that registers and transfers are 64bits, when you are able to compile your app for 64bit.

Well, if pointers are bigger then code is going to be bigger; since compilers either have to:

a) use larger instructions (i.e. those with integral addressing).
or
b) generate more instructions (i.e. multiple instructions without integral addressing).

In either case the code will be bigger, and given a cache of equal size (I see no reason why the 32bit mode of the processor wouldn’t use all the cache memory) the 32bit code will have better cache locality (and thus run faster).

I think it actually works out that the code is about 20% larger on x86; which, (intuitively) would lead me to believe that for the same cache size it would experience about a 20% lower cache hit rate than 32bit code.

Uh oh, I’ve been wikipedia’d

That actually sounds reasonable. Since most business applications are data intensive, the wins on data copies, outweigh the losses in cache locality, and you get a net gain. I can certainly believe though, that there are applications (hard real-time apps come to mind) that don’t move a lot of data, but do run a lot of instructions, that would be slower on 64bit.

Slightly off-topic: Last night, less than 45 seconds after CNN called John McCain the winner in Florida, I checked the wikipedia entry for John McCain and there was an entry that said:

“On Jan29th, 2008 John McCain won the Florida GOP primary”

What they lack in editing, they certainly do make up for in currency…

yicheng · January 31, 2008, 5:29am

Does the x86 64bit instruction set contribute to the speed increase? or the QNX kernel doesn’t use these 64bit instruction set at all?

mario · January 31, 2008, 7:03am

I was talking about Windows 64bit or Linux 64 bits. Under QNX whether the process (x86 familly) has 64bit or not makes no difference since it is not used , nor can it be.

maschoen · January 31, 2008, 4:31pm

Hmm, very confusing. vm386 mode? I recall there was a mode in 386’s and above that created an isolated 8086 processor environment, a place where Windows placed its “Dos Box”. I don’t think QNX runs in there.

I think you are right about 4GB process address space, in that QNX uses a memory model that requires all segment registers to point to the same blocks of memory to work correctly. Shared libraries are beamed in using paged blocks. But that does not mean that the ES register goes away. Pointing it somewhere else, even if you could, might break other things. Overlaying CS and DS does have some advantages, and the unlikelyhood of a single processes needing more than 4Gig is remote enough to fully justify it, but there’s no technical reason why the model couldn’t be enhanced to allow access to a 2nd data segment register. Some coordination with the compiler might be necessary, and you might not want the method of access to be purely portable. This was true under QNX 2 (about a million years ago if I recall properly) where data and code spaces were separate and each limited to 64K, but an enhancement to the compiler used -} instead of → to allow pointers to offset against the ES register, allowing unlimited additional 64K segments. This was how it was done until the CI large model compiler came along.

rgallen · January 31, 2008, 9:38pm

maschoen:

rgallen:

QNX has supported more than 4GB of physical memory for several versions, but I was referring to process address space. As long as QNX is 32bit, you will only have 4GB of process address space.

When in vm386 mode there is no ES…

Hmm, very confusing. vm386 mode? I recall there was a mode in 386’s and above that created an isolated 8086 processor environment, a place where Windows placed its “Dos Box”. I don’t think QNX runs in there.

I think you are right about 4GB process address space, in that QNX uses a memory model that requires all segment registers to point to the same blocks of memory to work correctly. Shared libraries are beamed in using paged blocks. But that does not mean that the ES register goes away. Pointing it somewhere else, even if you could, might break other things. Overlaying CS and DS does have some advantages, and the unlikelyhood of a single processes needing more than 4Gig is remote enough to fully justify it, but there’s no technical reason why the model couldn’t be enhanced to allow access to a 2nd data segment register. Some coordination with the compiler might be necessary, and you might not want the method of access to be purely portable. This was true under QNX 2 (about a million years ago if I recall properly) where data and code spaces were separate and each limited to 64K, but an enhancement to the compiler used -} instead of → to allow pointers to offset against the ES register, allowing unlimited additional 64K segments. This was how it was done until the CI large model compiler came along.

Mitch, vm386 is the flat memory model where there are no segments (i.e. all segment registers point to the same place). It first showed up on the 386, but QNX didn’t use it for quite a while…

maschoen · February 1, 2008, 3:09am

Your comments do help, so thank you. Clearly the ES register does not go away. A user process might have no other valid place to point it, but that doesn’t prevent the OS from giving it one. That QNX does not support this, I accept.