Which is faster? DEVCTL or MsgSend

Which one can get data to and from another process faster. I’m hoping you
guys can save me the time of running an experiment.

Chris Rose <chris.rose@viasat.com> wrote:

Which one can get data to and from another process faster. I’m hoping you
guys can save me the time of running an experiment.

DEVCTL use MsgSend().

-xtang

Chris Rose <chris.rose@viasat.com> wrote:

Which one can get data to and from another process faster. I’m hoping you
guys can save me the time of running an experiment.

Well, devctl() is ultimatly a MsgSend (just like everything else is in
QNX clib). So, a raw MsgSend is going to be faster but not too much.
Generally using a devctl() is the easiest way to do things “out of band”
to a resource manager.

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Chris Rose wrote:

Which one can get data to and from another process faster. I’m hoping you
guys can save me the time of running an experiment.

MsgSend will be only microscopically faster than devctl. It definately
isn’t worth worrying about (or perhaps more correctly, if this is worth
worrying about then you must have a lot of time on your hands :slight_smile:

Here’s the source to devctl, as you can see there are maybe a dozen or
so more op codes than a naked MsgSend.


int devctl(int fd, int dcmd, void *data_ptr, size_t nbytes, int *info_ptr) {
io_devctl_t msg;
iov_t iov[4];
int status;

// Stuff the message.
msg.i.type = _IO_DEVCTL;
msg.i.combine_len = sizeof msg.i;
msg.i.dcmd = dcmd;
msg.i.nbytes = nbytes;
msg.i.zero = 0;

// Setup data to the device.
SETIOV(iov + 0, &msg.i, sizeof msg.i);
SETIOV(iov + 1, data_ptr, (dcmd & DEVDIR_TO) ? nbytes : 0);

// Setup data from the device.
SETIOV(iov + 2, &msg.o, sizeof msg.o);
SETIOV(iov + 3, data_ptr, (dcmd & DEVDIR_FROM) ? nbytes : 0);

if((status = MsgSendv_r(fd, iov + 0, GETIOVLEN(&iov[1]) ? 2 :
1, iov + 2, GETIOVLEN(&iov[3]) ? 2 : 1)) != EOK) {
return status == -ENOSYS ? ENOTTY : -status;
}

if(info_ptr) {
*info_ptr = msg.o.ret_val;
}

return EOK;
}


“Rennie Allen” <rallen@csical.com> wrote in message
news:3CA27282.6070207@csical.com

Chris Rose wrote:

Which one can get data to and from another process faster. I’m hoping
you
guys can save me the time of running an experiment.


MsgSend will be only microscopically faster than devctl. It definately
isn’t worth worrying about (or perhaps more correctly, if this is worth
worrying about then you must have a lot of time on your hands > :slight_smile:

Here’s the source to devctl, as you can see there are maybe a dozen or
so more op codes than a naked MsgSend.


[cut code]

That’s only half of it. The resource manager framework is involved
on the receiving end.

There was a post a few month ago about someone comparing a
MsgSend and a read(). From memory MsgSend was 3 times faster then read().

  • Mario

Mario Charest <goto@nothingness.com> wrote:

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3CA27282.6070207@csical.com> …
Chris Rose wrote:

Which one can get data to and from another process faster. I’m hoping
you
guys can save me the time of running an experiment.


MsgSend will be only microscopically faster than devctl. It definately
isn’t worth worrying about (or perhaps more correctly, if this is worth
worrying about then you must have a lot of time on your hands > :slight_smile:

Here’s the source to devctl, as you can see there are maybe a dozen or
so more op codes than a naked MsgSend.


[cut code]

That’s only half of it. The resource manager framework is involved
on the receiving end.

There was a post a few month ago about someone comparing a
MsgSend and a read(). From memory MsgSend was 3 times faster then read().

As well, devctl() has the known limitation of having the send and receive
buffers being the same size – this is not the case with a MsgSend().

Cheers,
-RK


Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training and Consulting at www.parse.com.
Email my initials at parse dot com.

Mario Charest wrote:


The resource manager framework is involved
on the receiving end.

I don’t see any reference in the original post to the use of the
resource manager framework. All that was being asked was what call gets
data to/from a process faster (nothing about how much time is spent
processing the data once it has made it to the target process).

The answer to Chris’s question as phrased is indeed MsgSend, but the
real question that should have been asked is “does overhead of the
devctl cover add significantly to the overall overhead of data transfer
vis-a-vis a naked MsgSend”, and the answer to that question is no
(assuming of course that some reasonable amount of data is being written
with each call).


There was a post a few month ago about someone comparing a
MsgSend and a read(). From memory MsgSend was 3 times faster then read().

3 times what ? 3 times 50 nanoseconds is 150 nanoseconds, which is noise
(and not worth worrying about), unless you are only sending/receiving a
few bytes of data per call (in which case your own processing overhead
will kill you - and devctl vs. MsgSend is again a moot point). Not to
mention the fact that a read and a devctl (as implemented on the server
side) are different beasts, and are not very comparable.

Rennie

“Rennie Allen” <rallen@csical.com> wrote in message
news:3CA29047.20809@csical.com

Mario Charest wrote:


The resource manager framework is involved
on the receiving end.


I don’t see any reference in the original post to the use of the
resource manager framework.

I think it’s fair to assume if devctl is used, resource manager framework
is involved. Granted it may not, as one could write it own.

All that was being asked was what call gets
data to/from a process faster (nothing about how much time is spent
processing the data once it has made it to the target process).

Ok I extrapolated the question to what I think is a more realistic
scenario.

I beleive the time measure was from the read() call to the entry of the read
callback in the resmgr. Wat was done inside the callback was not
consider.


The answer to Chris’s question as phrased is indeed MsgSend, but the
real question that should have been asked is “does overhead of the
devctl cover add significantly to the overall overhead of data transfer
vis-a-vis a naked MsgSend”, and the answer to that question is no
(assuming of course that some reasonable amount of data is being written
with each call).

I prefer giving out number (alhtough in this case they are far from
reliable)
and let the person judge if the overhead is significant or not

Example: if MsgSend takes 10us and devctl takes 30us, that’s 20us extra
for every devctl. If you make 1000 calls a second that means 20ms
every second. If the machine is to performance oriented (vision,
optimisation, etc) that means it has 20ms less per seconds to
work on the data. Personaly I’ve work on some system where
20ms was having an impact on the efficient of the machine!

There was a post a few month ago about someone comparing a
MsgSend and a read(). From memory MsgSend was 3 times faster then
read().


3 times what ? 3 times 50 nanoseconds is 150 nanoseconds, which is noise
(and not worth worrying about), unless you are only sending/receiving a
few bytes of data per call (in which case your own processing overhead
will kill you - and devctl vs. MsgSend is again a moot point).

In my view when someone asks which of two metho is the fastest that’s
because they are concern with speed. The person didn’t ask how fast it was
but which is the fastest.

I would have like to give numbers but I couldn’t find the origninal post.
From memory it was in the us range, but I don’t remember the
CPU speed…

Not to
mention the fact that a read and a devctl (as implemented on the server
side) are different beasts, and are not very comparable.

Rennie

I wrote my original resource managers in a way that would allow me to
recompile as an object or library to the client if I found that the IPC
would take too long.
The client calls proxy functions that are defined in the client side source
file for the resource manager. These proxy functions do the DEVCTL for me
behind the scenes to give the client code a more readable interface.
On the resource manager side, I recieve the messages then call the
appropriate stub function out of a switch case.
The stub functions are defined in a server side source file that is specific
to the hardware being used. The stub function names are the same as the
proxy function names.
Therefore, at link time I can determine whether I want to link against the
proxy functions to run as an out-of-process server, or link the server side
implementation of the functions and the function calls then are all local.
When I linked it with local functions to the hardware, the results were
almost 10x better. 20us in-process, vs. 150us out-of-process. This proved to
me that it is not the hardware itself that’s taking up all the time.
The question now is… have I imposed too much overhead in the resource
manager? I don’t see how I have. The resource manager just maps a DEVCTL
message to function stubs that are defined in another source file, but they
are compiled into the same process.
Any ideas? I like the versatility of resource managers since I can change
hardware implementation, without any of the clients knowing. The other
really BIG advantage is that the resource manager can act as arbitor for all
clients trying to use the hardware. That way it keeps two clients from
having write priveledges to the same port on my hardware device. Of course
all clients should be able to read any port.

“Chris Rose” <chris.rose@viasat.com> wrote in message
news:a7tovo$3hl$1@inn.qnx.com

Which one can get data to and from another process faster. I’m hoping you
guys can save me the time of running an experiment.

“Chris Rose” <chris.rose@viasat.com> wrote in message
news:a7v643$4os$1@inn.qnx.com

I wrote my original resource managers in a way that would allow me to
recompile as an object or library to the client if I found that the IPC
would take too long.
The client calls proxy functions that are defined in the client side
source
file for the resource manager. These proxy functions do the DEVCTL for me
behind the scenes to give the client code a more readable interface.
On the resource manager side, I recieve the messages then call the
appropriate stub function out of a switch case.
The stub functions are defined in a server side source file that is
specific
to the hardware being used. The stub function names are the same as the
proxy function names.
Therefore, at link time I can determine whether I want to link against the
proxy functions to run as an out-of-process server, or link the server
side
implementation of the functions and the function calls then are all local.
When I linked it with local functions to the hardware, the results were
almost 10x better. 20us in-process, vs. 150us out-of-process.

So you are comparing “in8” like instruction to devctl? That ain’t fair :wink:

This proved to
me that it is not the hardware itself that’s taking up all the time.
The question now is… have I imposed too much overhead in the resource
manager?

You have introduce message passing and a context switches.

The resource manager framework also add some overhead to
be POSIX compliant for example.

I don’t see how I have. The resource manager just maps a DEVCTL
message to function stubs that are defined in another source file, but
they
are compiled into the same process.
Any ideas? I like the versatility of resource managers since I can change
hardware implementation, without any of the clients knowing.

Can’t get something for nothing :wink:

The other
really BIG advantage is that the resource manager can act as arbitor for
all
clients trying to use the hardware. That way it keeps two clients from
having write priveledges to the same port on my hardware device. Of course
all clients should be able to read any port.

I find the 130us rather big, what’s you CPU?

  • Mario

“Mario Charest” <goto@nothingness.com> wrote in message
news:a7vad6$7t2$1@inn.qnx.com

“Chris Rose” <> chris.rose@viasat.com> > wrote in message
news:a7v643$4os$> 1@inn.qnx.com> …
I wrote my original resource managers in a way that would allow me to
recompile as an object or library to the client if I found that the IPC
would take too long.
The client calls proxy functions that are defined in the client side
source
file for the resource manager. These proxy functions do the DEVCTL for
me
behind the scenes to give the client code a more readable interface.
On the resource manager side, I recieve the messages then call the
appropriate stub function out of a switch case.
The stub functions are defined in a server side source file that is
specific
to the hardware being used. The stub function names are the same as the
proxy function names.
Therefore, at link time I can determine whether I want to link against
the
proxy functions to run as an out-of-process server, or link the server
side
implementation of the functions and the function calls then are all
local.
When I linked it with local functions to the hardware, the results were
almost 10x better. 20us in-process, vs. 150us out-of-process.

So you are comparing “in8” like instruction to devctl? That ain’t fair
:wink:

This proved to
me that it is not the hardware itself that’s taking up all the time.
The question now is… have I imposed too much overhead in the resource
manager?

You have introduce message passing and a context switches.

The resource manager framework also add some overhead to
be POSIX compliant for example.

I don’t see how I have. The resource manager just maps a DEVCTL
message to function stubs that are defined in another source file, but
they
are compiled into the same process.
Any ideas? I like the versatility of resource managers since I can
change
hardware implementation, without any of the clients knowing.

Can’t get something for nothing > :wink:

The other
really BIG advantage is that the resource manager can act as arbitor for
all
clients trying to use the hardware. That way it keeps two clients from
having write priveledges to the same port on my hardware device. Of
course
all clients should be able to read any port.


I find the 130us rather big, what’s you CPU?

  • Mario

I found the overhead of read() / write() to be about 25-30 us on a Celeron
566 with QNX 6.0.

Marty Doane
Siemens Dematic

The pc platform is a WinSystems 166MHz PC/104 Pentium board.
I had done an earlier experiment where I just did a MsgSend and MsgReceive
on a server and client. The client would set a hardware bit high before
sending the message. The server would receive the msg then set the bit low
and relpy. The process ran in a loop forever. The pulse width was about
15uS. So the complete Send/Recieve is approx 30uS.
So in my resource manager I was guessing that the messaging overhead was no
more than 30uS to 40uS and the rest of the 150uS was hardware servicing. But
when I made the hardware functions calls inside the processs, the time
dropped to 20uS, which made it look like the messaging overhead was 130uS.
So I don’t know why the difference? Making the function calls locally versus
using DEVCTL messages reduced the time from 150uS to 20uS, leading me to
believe that the DEVCTL overhead is 130uS. Yet in the simple experiment it
seems I could send and receive messages within 30uS.
So what’s the difference?

“Chris Rose” <chris.rose@viasat.com> wrote in message
news:a7v643$4os$1@inn.qnx.com

I wrote my original resource managers in a way that would allow me to
recompile as an object or library to the client if I found that the IPC
would take too long.
The client calls proxy functions that are defined in the client side
source
file for the resource manager. These proxy functions do the DEVCTL for me
behind the scenes to give the client code a more readable interface.
On the resource manager side, I recieve the messages then call the
appropriate stub function out of a switch case.
The stub functions are defined in a server side source file that is
specific
to the hardware being used. The stub function names are the same as the
proxy function names.
Therefore, at link time I can determine whether I want to link against the
proxy functions to run as an out-of-process server, or link the server
side
implementation of the functions and the function calls then are all local.
When I linked it with local functions to the hardware, the results were
almost 10x better. 20us in-process, vs. 150us out-of-process. This proved
to
me that it is not the hardware itself that’s taking up all the time.
The question now is… have I imposed too much overhead in the resource
manager? I don’t see how I have. The resource manager just maps a DEVCTL
message to function stubs that are defined in another source file, but
they
are compiled into the same process.
Any ideas? I like the versatility of resource managers since I can change
hardware implementation, without any of the clients knowing. The other
really BIG advantage is that the resource manager can act as arbitor for
all
clients trying to use the hardware. That way it keeps two clients from
having write priveledges to the same port on my hardware device. Of course
all clients should be able to read any port.

“Chris Rose” <> chris.rose@viasat.com> > wrote in message
news:a7tovo$3hl$> 1@inn.qnx.com> …
Which one can get data to and from another process faster. I’m hoping
you
guys can save me the time of running an experiment.
\

“Chris Rose” <chris.rose@viasat.com> wrote in message
news:a7vmgj$gar$1@inn.qnx.com

The pc platform is a WinSystems 166MHz PC/104 Pentium board.
I had done an earlier experiment where I just did a MsgSend and MsgReceive
on a server and client. The client would set a hardware bit high before
sending the message. The server would receive the msg then set the bit low
and relpy. The process ran in a loop forever. The pulse width was about
15uS. So the complete Send/Recieve is approx 30uS.
So in my resource manager I was guessing that the messaging overhead was
no
more than 30uS to 40uS and the rest of the 150uS was hardware servicing.
But
when I made the hardware functions calls inside the processs, the time
dropped to 20uS, which made it look like the messaging overhead was 130uS.
So I don’t know why the difference? Making the function calls locally
versus
using DEVCTL messages reduced the time from 150uS to 20uS, leading me to
believe that the DEVCTL overhead is 130uS. Yet in the simple experiment it
seems I could send and receive messages within 30uS.
So what’s the difference?

Resource manager overhead I would assume.

Out of curiously what happens if you link staticly? (I’ve always wanted
to measure the overhead of using a .so versus static)

Can you post the code of your devctl callback (resmgr)?

  • Mario

Chris Rose wrote:

The pc platform is a WinSystems 166MHz PC/104 Pentium board.
I had done an earlier experiment where I just did a MsgSend and MsgReceive
on a server and client. The client would set a hardware bit high before
sending the message. The server would receive the msg then set the bit low
and relpy. The process ran in a loop forever. The pulse width was about
15uS. So the complete Send/Recieve is approx 30uS.
So in my resource manager I was guessing that the messaging overhead was no
more than 30uS to 40uS and the rest of the 150uS was hardware servicing. But
when I made the hardware functions calls inside the processs, the time
dropped to 20uS, which made it look like the messaging overhead was 130uS.
So I don’t know why the difference? Making the function calls locally versus
using DEVCTL messages reduced the time from 150uS to 20uS, leading me to
believe that the DEVCTL overhead is 130uS. Yet in the simple experiment it
seems I could send and receive messages within 30uS.
So what’s the difference?

Where did you twiddle the bit, inside a resource manager framework
callback (I am assuming that you set the bit immediately prior to the
call to devctl) ?

If you did set the bit low from the devctl callback from the resource
manager framework then you were measuring all overhead (devctl cover
overhead, context switch overhead, and resource manager framework
overhead, and a total (on direction) of 15 usec sounds feasible on your
hardware.

There’s something strange with your results. If you measured the
overhead at 30usec then that is what it is. I would suspect that in
your two tests, the hardware is not being accessed in exactly the same
sequence. If you take out your custom hardware, and have your code
access, say, the parallel port; then post it here, and others could
examine the differences.

Rennie

Marty Doane wrote:


I found the overhead of read() / write() to be about 25-30 us on a Celeron
566 with QNX 6.0.

Is This a measure of round-trip read/write, or unidirectional ? Did you
happen to try devctl (it should definately be lower)

Rennie

I stripped out all the resource manager overhead. Now I simply wait for a
message recieve. I do all the same hardware operations as before and my
roundtrip time dropped from 150uS to 66uS, almost a 3x reduction.
This is consistent with my earlier experiments where I timed the one-way
MsgSend to be 15uS (so assume round trip would be 30uS) and I ran an
experiment where all the hardware functions were linked locally and found
that it took 20uS to read the hardware. Therefore, I would expect a MsgSend,
Read hardware, MsgReply to take about 50uS or so. That’s also consistent
with the reference metioned by Mario that said another post had found
MsgSend / MsgReply to be about 3x faster than DevCtl.

“Mario Charest” <goto@nothingness.com> wrote in message
news:a7tutn$7a6$1@inn.qnx.com

“Rennie Allen” <> rallen@csical.com> > wrote in message
news:> 3CA27282.6070207@csical.com> …
Chris Rose wrote:

Which one can get data to and from another process faster. I’m hoping
you
guys can save me the time of running an experiment.


MsgSend will be only microscopically faster than devctl. It definately
isn’t worth worrying about (or perhaps more correctly, if this is worth
worrying about then you must have a lot of time on your hands > :slight_smile:

Here’s the source to devctl, as you can see there are maybe a dozen or
so more op codes than a naked MsgSend.


[cut code]

That’s only half of it. The resource manager framework is involved
on the receiving end.

There was a post a few month ago about someone comparing a
MsgSend and a read(). From memory MsgSend was 3 times faster then read().

  • Mario

“Chris Rose” <rose_chris@excite.com> wrote in message
news:a83el9$5a6$1@inn.qnx.com

I stripped out all the resource manager overhead. Now I simply wait for a
message recieve. I do all the same hardware operations as before and my
roundtrip time dropped from 150uS to 66uS, almost a 3x reduction.
This is consistent with my earlier experiments where I timed the one-way
MsgSend to be 15uS (so assume round trip would be 30uS) and I ran an
experiment where all the hardware functions were linked locally and found
that it took 20uS to read the hardware. Therefore, I would expect a
MsgSend,
Read hardware, MsgReply to take about 50uS or so. That’s also consistent
with the reference metioned by Mario that said another post had found
MsgSend / MsgReply to be about 3x faster than DevCtl.

Can you mesure the time for the call to the dispatch_handler() function
or resmgr_handler() function, depending which you are using (one may
be faster then the other). I suspect that’s where most of the time is
spent.

You could also measure how long your devctl callback function takes
and be able to figure exactly the amount of overhead introduce by
the framework.

I’ve look at the code of these functions and there is a fair amount
of stuff going on, to me that explains it all :wink:

  • Mario

Round trip. I compared the execution time of a foo() function with the
execution time of a read() or write() that invokes foo() within the resource
manager.

No, I haven’t experimented with devctl.


Marty Doane
Siemens Dematic

“Rennie Allen” <rallen@csical.com> wrote in message
news:3CA4A8D7.5040600@csical.com

Marty Doane wrote:


I found the overhead of read() / write() to be about 25-30 us on a
Celeron
566 with QNX 6.0.


Is This a measure of round-trip read/write, or unidirectional ? Did you
happen to try devctl (it should definately be lower)

Rennie