Message Passing & Dma

“Daryl Low” <dlow@student.math.uwaterloo.ca> wrote in message
news:3A1F3B27.A707EA10@student.math.uwaterloo.ca

I was chatting with Paul Bell about DMA transfers. We were wondering
if, after doing a MsgReceive on a message, we can obtain a physical
page list of the sender’s reply buffer.

This way, an io-net or io-blk type server application could DMA
streaming data directly from hardware into a client’s address space
(zero copy). In the case of a remote request, we’d end up DMAing into
QNet’s buffers, which in turn get DMA’d directly to the network adapter
(single copy relay).

My two cents,

For read I don’t think that’s possible because you frist have to look at the
packet content to decide to what client it belows to :wink:

For write, it’s most likely the data the client wants to send is not
in a contiguous memory nor in DMAable memory.

If the reply buffer is not nicely page aligned, we can reply to the
request the way we would normally do it (ie. DMA data into local buffers
and MsgReply the data to the client).

Not only must it be page align, it must be contigous memory and
DMAable memory space.

Daryl Low
University of Waterloo

Mario Charest wrote:

For read I don’t think that’s possible because you frist have to look
at the packet content to decide to what client it belows to > :wink:

When you say “packet” do you mean network packet? What about disk I/O
where big DMA transfers are common?

For write, it’s most likely the data the client wants to send is not
in a contiguous memory nor in DMAable memory.

True. Although the main goal is large unbuffered reads, even writes,
depending on the application, could be page aligned. Some examples:

dumping large frame-buffers (graphics, audio, video)
high-speed data generation (like a deja-view type kernel logger)

If this feature was actually available, developers might decide to take
advantage of it.

If the reply buffer is not nicely page aligned, we can reply to the
request the way we would normally do it (ie. DMA data into local
buffers and MsgReply the data to the client).

Not only must it be page align, it must be contigous memory and
DMAable memory space.

I suppose that might be true for ISA devices, but what about PCI? I’m
not a HW expert, but I thought that most SCSI, IDE, Firewire, USB, Gbit
Ethernet devices supported scatter/gather DMA. Isn’t this how BSD does
high-speed unbuffered I/O?

Daryl Low
University of Waterloo

The whole point of this is, in instances where it CAN be used, if
implemented, would the increased performance be worth something to QSSL
and the developers?

PKodon

Daryl Low wrote:

Mario Charest wrote:

For read I don’t think that’s possible because you frist have to look
at the packet content to decide to what client it belows to > :wink:

When you say “packet” do you mean network packet? What about disk I/O
where big DMA transfers are common?

For write, it’s most likely the data the client wants to send is not
in a contiguous memory nor in DMAable memory.

True. Although the main goal is large unbuffered reads, even writes,
depending on the application, could be page aligned. Some examples:

dumping large frame-buffers (graphics, audio, video)
high-speed data generation (like a deja-view type kernel logger)

If this feature was actually available, developers might decide to take
advantage of it.

If the reply buffer is not nicely page aligned, we can reply to the
request the way we would normally do it (ie. DMA data into local
buffers and MsgReply the data to the client).

Not only must it be page align, it must be contigous memory and
DMAable memory space.

I suppose that might be true for ISA devices, but what about PCI? I’m
not a HW expert, but I thought that most SCSI, IDE, Firewire, USB, Gbit
Ethernet devices supported scatter/gather DMA. Isn’t this how BSD does
high-speed unbuffered I/O?

Daryl Low
University of Waterloo


Paul E. Bell | Email and AIM: wd0gcp@millcomm.com | ifMUD: Helios
IRC: PKodon, DrWho4, and Helios | webpage: members.nbci.com/wd0gcp/
Member: W.A.R.N., Skywarn, ARES, Phoenix Developer Consortium, …
_____ Pen Name/Arts & Crafts signature:
| | _ \ _ _ |/ _ (
| | (X (//\ (_) (_ |(
) () (|_) (/`
)

“Paul E. Bell” <wd0gcp@millcomm.com> wrote in message
news:3A20307B.BFDED62@millcomm.com
| The whole point of this is, in instances where it CAN be used, if
| implemented, would the increased performance be worth something to QSSL
| and the developers?
|
| PKodon


It depends on a few things. Increased performance is great, but at what
expense? What constraints define when it can be used, and is there a downside
for where it can’t be used? At what level are you proposing that this be
integrated into the O.S.? What exactly do you mean by “worth something to QSSL
and the developers”? Who are “the developers”? It sounds like you want to
sell me an add-on package that does DMA transfers to network and storage
devices. Any generic solution is probably going to have to involve the QNX
folks, and the odds of this fitting into their schedules any time soon isn’t
very likely, especially considering their emphasis on the embedded marketplace.
I have no interest in a third party I/O subsystem whatsoever. I have no idea
where you’re headed with this, so…

In response to the original question, there is no way that I’ve ever heard of
for you to get the “gather list” from the kernel (which does all of the message
passing). You would have to send the physical addresses in the message and
then deal with them in your driver.

-Warren

The idea came up while working on a project, and some DMA questions came
up. We don’t want to add this on as a third party thing, we can’t.
This has to be done in the kernel, as a service to the I/O manager, that
the I/O manager (such as io-net or io-blk) could take advantage of, and
therefore, embedded apps could take advantage of it, through the I/O
manager.

PKodon

Warren Peece wrote:

“Paul E. Bell” <> wd0gcp@millcomm.com> > wrote in message
news:> 3A20307B.BFDED62@millcomm.com> …
| The whole point of this is, in instances where it CAN be used, if
| implemented, would the increased performance be worth something to QSSL
| and the developers?
|
| PKodon

It depends on a few things. Increased performance is great, but at what
expense? What constraints define when it can be used, and is there a downside
for where it can’t be used? At what level are you proposing that this be
integrated into the O.S.? What exactly do you mean by “worth something to QSSL
and the developers”? Who are “the developers”? It sounds like you want to
sell me an add-on package that does DMA transfers to network and storage
devices. Any generic solution is probably going to have to involve the QNX
folks, and the odds of this fitting into their schedules any time soon isn’t
very likely, especially considering their emphasis on the embedded marketplace.
I have no interest in a third party I/O subsystem whatsoever. I have no idea
where you’re headed with this, so…

In response to the original question, there is no way that I’ve ever heard of
for you to get the “gather list” from the kernel (which does all of the message
passing). You would have to send the physical addresses in the message and
then deal with them in your driver.

-Warren


Paul E. Bell | Email and AIM: wd0gcp@millcomm.com | ifMUD: Helios
IRC: PKodon, DrWho4, and Helios | webpage: members.nbci.com/wd0gcp/
Member: W.A.R.N., Skywarn, ARES, Phoenix Developer Consortium, …
_____ Pen Name/Arts & Crafts signature:
| | _ \ _ _ |/ _ (
| | (X (//\ (_) (_ |(
) () (|_) (/`
)

Warren Peece wrote:

It depends on a few things. Increased performance is great, but at
what expense? What constraints define when it can be used, and is
there a downside for where it can’t be used?

Constraint: The client’s buffer must be page aligned and it’s size has
to be a multiple of the page size.

Downside: If the buffer is not aligned, then you do it the old
fashioned way – you DMA into local buffers and MsgReply the stuff
back. So, performance levels should be the same as it is right now in
the “bad” case.

Upside: If everything is properly aligned, then you just cut most of
the CPU overhead and saved a large wasted data copy.

For large non-caching transfers like DVD, it would be worth the effort
to mmap a big page aligned buffer and have the filesystem DMA directly
into your buffers.

At what level are you proposing that this be integrated into the
O.S.? What exactly do you mean by “worth something to QSSL and the
developers”? Who are “the developers”?

We’re trying to put this idea into the open, get feedback and let
QSSL know about it. If it withstands scrutiny, then consider it a
feature suggestion to QSSL. Of course, the more relevant the feature
(“worth something”) is to QSSL’s business and to the developers who
support QSSL, the better chance it will be accepted.

It sounds like you want to sell me an add-on package that does DMA
transfers to network and storage devices.

No, this is not something a 3rd party can do, nor do we want to
“sell” (as in 3rd party product) you anything. We simply wish to
voice an idea we have and get it to the right people.

Any generic solution is probably going to have to involve the QNX
folks, and the odds of this fitting into their schedules any time
soon isn’t very likely, especially considering their emphasis on
the embedded marketplace.

Yes, you’re absolutely right. If it sounds like a sales pitch, it’s
because in a way, it is! If there’s any hope in hell we’re gonna get
taken seriously, we’ve gotta show that we did our homework and
withstand peer review. And again, we have to show that it is relevant
to QSSL’s current business (the embedded marketplace). Why else
would they write a DVD player and port RealPlayer (both of which
would benefit from direct DMA transfers) if it wasn’t relevant to
their business?

I have no interest in a third party I/O subsystem whatsoever. I
have no idea where you’re headed with this, so…

As far as applicability goes, it can be used in io-blk and possibly
io-net. So, if you use “large” disk or network I/O in whatever you do,
this would interest you. I hope that you have a better idea where
we’re trying to head with this now.

In response to the original question, there is no way that I’ve ever
heard of for you to get the “gather list” from the kernel (which does
all of the message passing). You would have to send the physical
addresses in the message and then deal with them in your driver.

Right. And trusting the client to give the driver a proper “gather
list” isn’t a good way to do things. There’s no way for the driver to
verify that the physical addresses don’t refer to another address
space. This is why it must be a part of the message passing API,
where the driver can trust the validity of the gather list.

Daryl Low
University of Waterloo

“Daryl Low” <dlow@student.math.uwaterloo.ca> wrote in message
news:3A20251A.EDA7F062@student.math.uwaterloo.ca

Mario Charest wrote:

For read I don’t think that’s possible because you frist have to look
at the packet content to decide to what client it belows to > :wink:

When you say “packet” do you mean network packet? What about disk I/O
where big DMA transfers are common?

For write, it’s most likely the data the client wants to send is not
in a contiguous memory nor in DMAable memory.

True. Although the main goal is large unbuffered reads, even writes,
depending on the application, could be page aligned. Some examples:

dumping large frame-buffers (graphics, audio, video)
high-speed data generation (like a deja-view type kernel logger)

This would probably required specific memory allocation cover function
and non standard IO api. I would also be worried that the driver becomes
too fragile.
What if memory provided by the client isn’t as big as the driver thing
it is, would the driver SIGSEGV? Not nice… Maybe there are solution
to this, I hope there is cause this sounds very interesting. But for now
I’m taking devil’s side here :wink:

Not only must it be page align, it must be contigous memory and
DMAable memory space.

I suppose that might be true for ISA devices, but what about PCI? I’m
not a HW expert, but I thought that most SCSI, IDE, Firewire, USB, Gbit
Ethernet devices supported scatter/gather DMA. Isn’t this how BSD does
high-speed unbuffered I/O?

I’m not sure DMA can cover the whole memory space.

Daryl Low
University of Waterloo

Yep, that makes it clearer. It would be at the very least an extremely
interesting project to work on I admit. The pessimist in me says that it’ll be
a long time coming if at all, though… Imagine the groan when QNX tells their
staff they have to re-write all of the disk and network drivers. Heh. I wish
you guys luck…

-Warren


“Daryl Low” <dlow@student.math.uwaterloo.ca> wrote in message
news:3A206611.82382FA4@student.math.uwaterloo.ca
| Warren Peece wrote:
|
| > It depends on a few things. Increased performance is great, but at
| > what expense? What constraints define when it can be used, and is
| > there a downside for where it can’t be used?
|
| Constraint: The client’s buffer must be page aligned and it’s size has
| to be a multiple of the page size.
|
| Downside: If the buffer is not aligned, then you do it the old
| fashioned way – you DMA into local buffers and MsgReply the stuff
| back. So, performance levels should be the same as it is right now in
| the “bad” case.
|
| Upside: If everything is properly aligned, then you just cut most of
| the CPU overhead and saved a large wasted data copy.
|
| For large non-caching transfers like DVD, it would be worth the effort
| to mmap a big page aligned buffer and have the filesystem DMA directly
| into your buffers.
|
| > At what level are you proposing that this be integrated into the
| > O.S.? What exactly do you mean by “worth something to QSSL and the
| > developers”? Who are “the developers”?
|
| We’re trying to put this idea into the open, get feedback and let
| QSSL know about it. If it withstands scrutiny, then consider it a
| feature suggestion to QSSL. Of course, the more relevant the feature
| (“worth something”) is to QSSL’s business and to the developers who
| support QSSL, the better chance it will be accepted.
|
| > It sounds like you want to sell me an add-on package that does DMA
| > transfers to network and storage devices.
|
| No, this is not something a 3rd party can do, nor do we want to
| “sell” (as in 3rd party product) you anything. We simply wish to
| voice an idea we have and get it to the right people.
|
| > Any generic solution is probably going to have to involve the QNX
| > folks, and the odds of this fitting into their schedules any time
| > soon isn’t very likely, especially considering their emphasis on
| > the embedded marketplace.
|
| Yes, you’re absolutely right. If it sounds like a sales pitch, it’s
| because in a way, it is! If there’s any hope in hell we’re gonna get
| taken seriously, we’ve gotta show that we did our homework and
| withstand peer review. And again, we have to show that it is relevant
| to QSSL’s current business (the embedded marketplace). Why else
| would they write a DVD player and port RealPlayer (both of which
| would benefit from direct DMA transfers) if it wasn’t relevant to
| their business?
|
| > I have no interest in a third party I/O subsystem whatsoever. I
| > have no idea where you’re headed with this, so…
|
| As far as applicability goes, it can be used in io-blk and possibly
| io-net. So, if you use “large” disk or network I/O in whatever you do,
| this would interest you. I hope that you have a better idea where
| we’re trying to head with this now.
|
| > In response to the original question, there is no way that I’ve ever
| > heard of for you to get the “gather list” from the kernel (which does
| > all of the message passing). You would have to send the physical
| > addresses in the message and then deal with them in your driver.
|
| Right. And trusting the client to give the driver a proper “gather
| list” isn’t a good way to do things. There’s no way for the driver to
| verify that the physical addresses don’t refer to another address
| space. This is why it must be a part of the message passing API,
| where the driver can trust the validity of the gather list.
|
| Daryl Low
| University of Waterloo

Mario Charest wrote:

“Daryl Low” <> dlow@student.math.uwaterloo.ca> > wrote in message
news:> 3A20251A.EDA7F062@student.math.uwaterloo.ca> …
Mario Charest wrote:

For read I don’t think that’s possible because you frist have to look
at the packet content to decide to what client it belows to > :wink:

When you say “packet” do you mean network packet? What about disk I/O
where big DMA transfers are common?

For write, it’s most likely the data the client wants to send is not
in a contiguous memory nor in DMAable memory.

True. Although the main goal is large unbuffered reads, even writes,
depending on the application, could be page aligned. Some examples:

dumping large frame-buffers (graphics, audio, video)
high-speed data generation (like a deja-view type kernel logger)


This would probably required specific memory allocation cover function
and non standard IO api. I would also be worried that the driver becomes
too fragile.
What if memory provided by the client isn’t as big as the driver thing
it is, would the driver SIGSEGV? Not nice… Maybe there are solution
to this, I hope there is cause this sounds very interesting. But for now
I’m taking devil’s side here > :wink:

I believe that we are assuming that the kernel knows enough about the
memory belonging to the client to know whether it would work or not (is
it contiguous, is it completely in memory allocated to the client, is it
large enough, etc.).

By the way, if it’s possible for the client’s buffer to be increased by
the memory manager, could it not simply request the right kind of
memory, of the right size, in the first place?

Daryl is the “expert” of the two of us, I’m just the prodder for
improvements, but, I would like to hear all the arguments, regardless.

Not only must it be page align, it must be contigous memory and
DMAable memory space.

I suppose that might be true for ISA devices, but what about PCI? I’m
not a HW expert, but I thought that most SCSI, IDE, Firewire, USB, Gbit
Ethernet devices supported scatter/gather DMA. Isn’t this how BSD does
high-speed unbuffered I/O?

I’m not sure DMA can cover the whole memory space.

PKodon

Paul E. Bell | Email and AIM: wd0gcp@millcomm.com | ifMUD: Helios
IRC: PKodon, DrWho4, and Helios | webpage: members.nbci.com/wd0gcp/
Member: W.A.R.N., Skywarn, ARES, Phoenix Developer Consortium, …
_____ Pen Name/Arts & Crafts signature:
| | _ \ _ _ |/ _ (
| | (X (//\ (_) (_ |(
) () (|_) (/`
)

Mario Charest wrote:

dumping large frame-buffers (graphics, audio, video)
high-speed data generation (like a deja-view type kernel logger)

This would probably required specific memory allocation cover
function and non standard IO api. I would also be worried that the
driver becomes too fragile.

What’s wrong with:

void *buf;
int fd;

buf = mmap (NULL, 1024 * 4096, PROT_READ|PROT_WRITE|PROT_NOCACHE,
MAP_ANON|MAP_PHYS, NOFD, 0);
fd = open (“myfile.dat”, O_RDONLY | O_NOCACHE);
read (fd, buf, 1024 * 4096);

When io-blk gets the request, it notices the O_NOCACHE flag and calls
our hypothetical MsgPhysPageList kernel call, gets the physical page
list and DMAs the data into the buffer (calling MsgReply to unblock
the caller)?

If the buffer is not page aligned (say the user used malloc), then
MsgPhysPageList would return an error. In that case, io-blk fills
it’s own buffers (using DMA) and then using MsgReply to return the
data.

There’s no need for special cover functions.

What if memory provided by the client isn’t as big as the driver
thing it is, would the driver SIGSEGV? Not nice… Maybe there are
solution to this, I hope there is cause this sounds very
interesting. But for now I’m taking devil’s side here > :wink:

If MsgPhyPageList returns the DMA list of pages, it better not
return more pages than what’s in the reply buffer. :slight_smile:

I suppose that might be true for ISA devices, but what about PCI?
I’m not a HW expert, but I thought that most SCSI, IDE, Firewire,
USB, Gbit Ethernet devices supported scatter/gather DMA. Isn’t
this how BSD does high-speed unbuffered I/O?

I’m not sure DMA can cover the whole memory space.

In qdn.public.qnxrtp.os, subject: Re: Getting physical addresses
David Gibbs <dagibbs@qnx.com> suggests the following code for
allocating a PCI DMA safe buffer and obtaining the physical addr:

ptr = mmap( 0, size, PROT_READ|PROT_WRITE|PROT_NOCACHE,
MAP_ANON|MAP_PHYS|MAP_NOX64K, NOFD, 0 );
posix_mem_offset( ptr, nofd, 1 &phys_addr, NULL );

Nowhere in the mmap call is there an address restriction on where the
buffer can be in RAM. Thus, I can only assume that PCI can address
all system RAM (which makes a lot of sense).

In fact, MsgPhyPageList does the same thing as posix_mem_offset,
except that it performs it on a different address space.

Daryl Low
University of Waterloo

You might want to look into Mach3 VM design. They’re trying to transparently optimize message passing by substituting it for playing with page tables, when possible.

I spoke breifly with QNX folks about that and their feeling was that complexity of implementation would be prohibitive considering ‘when possible’ happens rather rarely.

They are considering a new VM anyway, but that won’t be soon. Meanwhile, we can look at MacOS X with is based partly on Mach3 kernel and see if it works any good :wink:

  • igor

Previously, Warren Peece wrote in qdn.public.qnxrtp.os:

Yep, that makes it clearer. It would be at the very least an extremely
interesting project to work on I admit. The pessimist in me says that it’ll be
a long time coming if at all, though… Imagine the groan when QNX tells their
staff they have to re-write all of the disk and network drivers. Heh. I wish
you guys luck…

-Warren


“Daryl Low” <> dlow@student.math.uwaterloo.ca> > wrote in message
news:> 3A206611.82382FA4@student.math.uwaterloo.ca> …
| Warren Peece wrote:
|
| > It depends on a few things. Increased performance is great, but at
| > what expense? What constraints define when it can be used, and is
| > there a downside for where it can’t be used?
|
| Constraint: The client’s buffer must be page aligned and it’s size has
| to be a multiple of the page size.
|
| Downside: If the buffer is not aligned, then you do it the old
| fashioned way – you DMA into local buffers and MsgReply the stuff
| back. So, performance levels should be the same as it is right now in
| the “bad” case.
|
| Upside: If everything is properly aligned, then you just cut most of
| the CPU overhead and saved a large wasted data copy.
|
| For large non-caching transfers like DVD, it would be worth the effort
| to mmap a big page aligned buffer and have the filesystem DMA directly
| into your buffers.
|
| > At what level are you proposing that this be integrated into the
| > O.S.? What exactly do you mean by “worth something to QSSL and the
| > developers”? Who are “the developers”?
|
| We’re trying to put this idea into the open, get feedback and let
| QSSL know about it. If it withstands scrutiny, then consider it a
| feature suggestion to QSSL. Of course, the more relevant the feature
| (“worth something”) is to QSSL’s business and to the developers who
| support QSSL, the better chance it will be accepted.
|
| > It sounds like you want to sell me an add-on package that does DMA
| > transfers to network and storage devices.
|
| No, this is not something a 3rd party can do, nor do we want to
| “sell” (as in 3rd party product) you anything. We simply wish to
| voice an idea we have and get it to the right people.
|
| > Any generic solution is probably going to have to involve the QNX
| > folks, and the odds of this fitting into their schedules any time
| > soon isn’t very likely, especially considering their emphasis on
| > the embedded marketplace.
|
| Yes, you’re absolutely right. If it sounds like a sales pitch, it’s
| because in a way, it is! If there’s any hope in hell we’re gonna get
| taken seriously, we’ve gotta show that we did our homework and
| withstand peer review. And again, we have to show that it is relevant
| to QSSL’s current business (the embedded marketplace). Why else
| would they write a DVD player and port RealPlayer (both of which
| would benefit from direct DMA transfers) if it wasn’t relevant to
| their business?
|
| > I have no interest in a third party I/O subsystem whatsoever. I
| > have no idea where you’re headed with this, so…
|
| As far as applicability goes, it can be used in io-blk and possibly
| io-net. So, if you use “large” disk or network I/O in whatever you do,
| this would interest you. I hope that you have a better idea where
| we’re trying to head with this now.
|
| > In response to the original question, there is no way that I’ve ever
| > heard of for you to get the “gather list” from the kernel (which does
| > all of the message passing). You would have to send the physical
| > addresses in the message and then deal with them in your driver.
|
| Right. And trusting the client to give the driver a proper “gather
| list” isn’t a good way to do things. There’s no way for the driver to
| verify that the physical addresses don’t refer to another address
| space. This is why it must be a part of the message passing API,
| where the driver can trust the validity of the gather list.
|
| Daryl Low
| University of Waterloo