which way to go? Linux or QNX?

“Rennie Allen” <rallen@csical.com> wrote in message
news:3D90670E.2020404@csical.com

William A. Flowers wrote:

I’m still on Q4, so all that is ahead of me. I’m not smiling. When we
port
if our application slows down by even 2-3% I’ll be screaming! You try
frame
grabbing fully 640x480 color images and analyzing the image for defects
in
18ms … every 18ms (3300 per minute).

Hopefully the hardware will be fast enough by the time you upgrade to
make up for the differences > :slight_smile:

Smile when you say that! I’m counting on faster hardware in addition to not
losing any (or much) software performance. I want to do even more elaborate
analysis than today without losing speed.

Bill Flowers
Clearwater, FL

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:amqigm$ivb$1@inn.qnx.com

Someone told me they get 3x the speed by using MMX instruction to perform
memcpy!!!

For certain cases, it wouldn’t surprise me.

Bill Flowers
Clearwater, FL

My opinion, constructive this time, is that there are certain things in the
OS that should be written in asm. I gained substantial speed improvements
just by writing my own asm memmove() function.

I’ve leave out all the other editorializing. Everyone knows how I feel
about the gnu compiler.

100% true. If you look at our memcpy it is done in asm. However, there are
huge improvements that can be made by fixing memmove() in C. For example,
calling memcpy() in the non-overlapping case and doing transfers of sizes
other then bytes. Right now the memmove() in libc always does byte-by-byte
copies, terribly slow. Doing the two mentioned changes gives improvements
of 10-15x (memcpy) and 5-6x (larger transfers).

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Yes, I know this, but what about the kernel version? The above is the
version of the rtp which is 6.2.1 in your case, but it is my
understanding that the actual Nto kernel version is something like
2.1.1. Would this be correct? Then the question is: how could I tell
the actual version of Nto kernel (or does this question makes on sense)?

This is the kernel version. The kernel is the only thing I am running
from a 6.2.1 level system on this box actually. Neutrino adpoted the 6.x
numbering and dropped the 2.x numbering nearly 2 years ago.

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Bill Caroselli (Q-TPS)” <QTPS@EarthLink.net> wrote in message
news:amqlkf$ku5$1@inn.qnx.com

I got almost 6x speed improvements over 6.01a by just coding memmove() in
asm. No MMX needed. The gcc compiler just doesn’t produce good code. I
think the real problem was that after each byte moved, it kept checking to
see if it won yet or not. I took that part out. ;~}

This was with Visual C++ which I beleive inline memcpy.

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:amqigm$ivb$> 1@inn.qnx.com> …

Someone told me they get 3x the speed by using MMX instruction to
perform
memcpy!!!
\

I have seen arguments like that before. They don’t convince me. Why it is
that I am thinking of WHY in the hell open() is VERY VERY expensive
operation and you guys just accept this fact like some inevitability? How
it is that you don’t have control over it? You own the bloody OS. If
something more fundamental is broken, it is about time to get it fixed you
you’ll soon be laughed upon.

Speaking about filesystem issue, I believe I found the reason. The open()
etc cetera are VERY VERY expensive because message passing performance
SUCKS beyond belief on small buffers. Everyone is welcome do benchmark ‘dd
if=/dev/zero of=/dev/null’ and come up with their own numbers. I don’t
suppose there’s helluva lot of lookups involved. More adventurous souls can
benchmark raw message passing bandwidth. My bet is, you won’t get far beyond
100Mb/sec on 512 byte buffers (but you can get 2Gb/sec on larger buffers).
It is also interesting that setting different block size for dd has
practically NO effect on performance.

That means 2 things:

  • you have to figure a way to do small message passing faster
  • you have to use bigger buffers in your I/O subsystems

I have said that several times, but have yet to see any reply from jgarvey
or anyone else.

– igor

“Xiaodan Tang” <xtang@qnx.com> wrote in message
news:amqijr$5ag$1@nntp.qnx.com

William A. Flowers <> wflowers_NOSPAM@insightcontrol.com> > wrote:

Yes, definitely faster. How much faster really depended on the file
sizes.
Lots of little files was slow because of all the control messages, as
you
point out above. A few big files and that was lost in the noise.

It always bothered me that I couldn’t make it faster in QNX4. Heaven
knows
I tried. The overhead of VC creates, prefix lookups, more VC creates,
open/creat calls, etc. was something beyond my control, and fundamental
to
the entire Q4 design.

Now you share my pain > :slight_smile:

QNET (FLEET the same) is not orianted into file system. NFSv2, v3 is way
more faster. The fact is NFS “know” they are dealing with “Files”, QNET
don’t > :frowning:

open() (prefix lookup) is VERY VERY expansive operation, which is beyond
QNET’s control. Do a lot of cross network open() (like copy small files)
is killing performance.

I have tried everything including combine small packets into a big one,
but then we need “multi-threaded” cp, otherwise, it won’t help more.

Being said that, seeking performance improve is still a major task for
all of us in QSS.

-xtang

Igor Kovalenko <kovalenko@attbi.com> wrote:

I have seen arguments like that before. They don’t convince me. Why it is
that I am thinking of WHY in the hell open() is VERY VERY expensive
operation and you guys just accept this fact like some inevitability? How
it is that you don’t have control over it? You own the bloody OS. If
something more fundamental is broken, it is about time to get it fixed you
you’ll soon be laughed upon.

Speaking about filesystem issue, I believe I found the reason. The open()
etc cetera are VERY VERY expensive because message passing performance
SUCKS beyond belief on small buffers. Everyone is welcome do benchmark ‘dd
if=/dev/zero of=/dev/null’ and come up with their own numbers. I don’t
suppose there’s helluva lot of lookups involved. More adventurous souls can
benchmark raw message passing bandwidth. My bet is, you won’t get far beyond
100Mb/sec on 512 byte buffers (but you can get 2Gb/sec on larger buffers).
It is also interesting that setting different block size for dd has
practically NO effect on performance.

That means 2 things:

  • you have to figure a way to do small message passing faster
  • you have to use bigger buffers in your I/O subsystems

I have said that several times, but have yet to see any reply from jgarvey
or anyone else.

Oh Igor.

Instead of shitting all over everything, why don't *you* come up with a better way? Can *you* come up with something that uses message passing for open() that won't suck? As you no doubt know, QNX is a message passing OS. Yes, that means that it passes messages. A message from the client to the server. And back. Yes, these are small messages. Yes, small messages incur an overhead. It's easy to say "I've said this sucks and no one has fixed it". But unless you come up with "I've said this sucks, and here's a really good way to fix it, why haven't you implemented this?" you're just pissing into the wind! This point too has been raised multiple times, but I have yet to see any reply from that suggests anything better. Xiaodan has told you what the problems are, and you just quoted his reply and claimed, in a "revelation from god/Igor" kind of way, to have "found the reason" -- he *already* knows about it:

open() (prefix lookup) is VERY VERY expansive operation, which is beyond
QNET’s control. Do a lot of cross network open() (like copy small files)
is killing performance.
/vent

See? Venting is easy. Solving the problems is much harder.

If you're so "disillusioned" with QNX, then maybe you should use Windows? I'm like so totally *sure* it's an all round "better" OS. Or linux? Why are you wasting your time complaining about QNX, when there must *surely* be *much* better OS's around? After all, I'm *sure* all problems have been solved under *all* other OS's except QNX. I just *love* the idea of digging deep into the kernel to install device drivers -- yummy. Or using an OS that has a "flavour of the day" based on which 12 year old MS-DOS hacker just wrote a device driver for it using polling in the kernel -- yummy creamy double-plus good. Or have an "open source team" of rabid monkeys running around making 75 different (incompatible) kernel versions that are all trying to piss over each other as the "next standard" -- faaaaaabulous. (That was sarcasm for those who missed it).

:-/

I’m not going to apologize for defending the QNX OS – it’s a damn good OS.
Put up, or make constructive suggestions. Please!

-RK

– igor

“Xiaodan Tang” <> xtang@qnx.com> > wrote in message
news:amqijr$5ag$> 1@nntp.qnx.com> …
William A. Flowers <> wflowers_NOSPAM@insightcontrol.com> > wrote:

Yes, definitely faster. How much faster really depended on the file
sizes.
Lots of little files was slow because of all the control messages, as
you
point out above. A few big files and that was lost in the noise.

It always bothered me that I couldn’t make it faster in QNX4. Heaven
knows
I tried. The overhead of VC creates, prefix lookups, more VC creates,
open/creat calls, etc. was something beyond my control, and fundamental
to
the entire Q4 design.

Now you share my pain > :slight_smile:

QNET (FLEET the same) is not orianted into file system. NFSv2, v3 is way
more faster. The fact is NFS “know” they are dealing with “Files”, QNET
don’t > :frowning:

open() (prefix lookup) is VERY VERY expansive operation, which is beyond
QNET’s control. Do a lot of cross network open() (like copy small files)
is killing performance.

I have tried everything including combine small packets into a big one,
but then we need “multi-threaded” cp, otherwise, it won’t help more.

Being said that, seeking performance improve is still a major task for
all of us in QSS.

-xtang


Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training and Consulting at www.parse.com.
Email my initials at parse dot com.

“Robert Krten” <nospam83@parse.com> wrote in message
news:amrdak$6ve$1@inn.qnx.com

[snip]

Oh Igor.

vent
Instead of shitting all over everything, why don’t you come up with
a better way? Can you come up with something that uses message passing
for open() that won’t suck? As you no doubt know, QNX is a message
passing
OS. Yes, that means that it passes messages. A message from the client
to the server. And back. Yes, these are small messages. Yes, small
messages incur an overhead. It’s easy to say “I’ve said this sucks and
no one has fixed it”. But unless you come up with “I’ve said this sucks,
and here’s a really good way to fix it, why haven’t you implemented this?”
you’re just pissing into the wind! This point too has been raised
multiple
times, but I have yet to see any reply from > kovalenko@attbi.com > that
suggests anything better. Xiaodan has told you what the problems are,

I could point you to a MUCH earlier thread when I was arguing over this with
jgarvey. So no, I did not learn this from xtang. If you want to make a point
make sure you have one. That was not a public group unfortunately.

and you just quoted his reply and claimed, in a “revelation from god/Igor”
kind of way, to have “found the reason” – he already knows about it:
open() (prefix lookup) is VERY VERY expansive operation, which is
beyond
QNET’s control. Do a lot of cross network open() (like copy small
files)
is killing performance.
/vent

See? Venting is easy. Solving the problems is much harder.

I don’t see where he’s saying that issue is with slowness of message passing
on small buffers.

more ventage
If you’re so “disillusioned” with QNX, then maybe you should use Windows?
I’m like so totally sure it’s an all round “better” OS. Or linux?
Why are you wasting your time complaining about QNX, when there must
surely be much better OS’s around? After all, I’m sure all
problems have been solved under all other OS’s except QNX. I just
love the idea of digging deep into the kernel to install device
drivers – yummy. Or using an OS that has a “flavour of the day”
based on which 12 year old MS-DOS hacker just wrote a device driver for
it using polling in the kernel – yummy creamy double-plus good. Or have
an
“open source team” of rabid monkeys running around making 75 different
(incompatible) kernel versions that are all trying to piss over each other
as the “next standard” – faaaaaabulous.
(That was sarcasm for those who missed it).
/ventage

This is typical mumbo-jumbo QNX zealots like to say. Most of the time I hear
this from people who never bothered to read a single good book on Unix
kernel design and just blindly believe that QNX does everything ‘the right
way’ while everyone else is just dumb to understand it. Unless you’re
prepared to defend every word you’re saying with substantiated arguments, it
only makes you look ridiculous.

:-/

I’m not going to apologize for defending the QNX OS – it’s a damn good
OS.
Put up, or make constructive suggestions. Please!

I believe I did. Furthermore, they apparently were implemented (quietly,
without ever acknowledging the problem) for next QNX versions. From
unofficial sources, next version will use 64k blocks for I/O internally.
Current one does 128 x MsgWrite(512 bytes) and then does MsgReply(). Is that
so ‘damn good’?

Now to the message passing. I do have constructive suggestions. Wait until
tomorrow, you should see a white paper on QNXZone.

– igor

Igor Kovalenko <kovalenko@attbi.com> wrote:

“Robert Krten” <> nospam83@parse.com> > wrote in message
news:amrdak$6ve$> 1@inn.qnx.com> …

[snip]

Oh Igor.

vent
Instead of shitting all over everything, why don’t you come up with
a better way? Can you come up with something that uses message passing
for open() that won’t suck? As you no doubt know, QNX is a message
passing
OS. Yes, that means that it passes messages. A message from the client
to the server. And back. Yes, these are small messages. Yes, small
messages incur an overhead. It’s easy to say “I’ve said this sucks and
no one has fixed it”. But unless you come up with “I’ve said this sucks,
and here’s a really good way to fix it, why haven’t you implemented this?”
you’re just pissing into the wind! This point too has been raised
multiple
times, but I have yet to see any reply from > kovalenko@attbi.com > that
suggests anything better. Xiaodan has told you what the problems are,

I could point you to a MUCH earlier thread when I was arguing over this with
jgarvey. So no, I did not learn this from xtang. If you want to make a point
make sure you have one. That was not a public group unfortunately.

I made my points based on what is available to me.

and you just quoted his reply and claimed, in a “revelation from god/Igor”
kind of way, to have “found the reason” – he already knows about it:
open() (prefix lookup) is VERY VERY expansive operation, which is
beyond
QNET’s control. Do a lot of cross network open() (like copy small
files)
is killing performance.
/vent

See? Venting is easy. Solving the problems is much harder.

I don’t see where he’s saying that issue is with slowness of message passing
on small buffers.

As I understood it, the thread was talking about copying lots of small
files across a network. This involves name resolution, which involves
lots of open()s with small messages.

more ventage
If you’re so “disillusioned” with QNX, then maybe you should use Windows?
I’m like so totally sure it’s an all round “better” OS. Or linux?
Why are you wasting your time complaining about QNX, when there must
surely be much better OS’s around? After all, I’m sure all
problems have been solved under all other OS’s except QNX. I just
love the idea of digging deep into the kernel to install device
drivers – yummy. Or using an OS that has a “flavour of the day”
based on which 12 year old MS-DOS hacker just wrote a device driver for
it using polling in the kernel – yummy creamy double-plus good. Or have
an
“open source team” of rabid monkeys running around making 75 different
(incompatible) kernel versions that are all trying to piss over each other
as the “next standard” – faaaaaabulous.
(That was sarcasm for those who missed it).
/ventage

This is typical mumbo-jumbo QNX zealots like to say. Most of the time I hear
this from people who never bothered to read a single good book on Unix
kernel design and just blindly believe that QNX does everything ‘the right
way’ while everyone else is just dumb to understand it. Unless you’re
prepared to defend every word you’re saying with substantiated arguments, it
only makes you look ridiculous.

Let’s not get into personal attacks. I’ve worked on iRMX86, iRMX286, VAX/VMS,
Unix, Xenix, and Linux, to name a few. Let me “defend” my points:

o rebuilding the kernel and digging deep into the kernel was the method used
to install device drivers in my experience.
o ALSA is a classic example; the devctl() passes a pointer to something in
the client’s address space, causing the driver to root around in the client’s
address space at kernel level.
o several projects I’ve been peripherally involved with needed to get support
for various linux packages, which depended on different (incompatible) versions
of the kernel.
o I’ve seen drivers where they were ported from single-user MS-DOS mode with
polling, in the kernel.

Your experience may be different; those are my experiences.

:-/

I’m not going to apologize for defending the QNX OS – it’s a damn good
OS.
Put up, or make constructive suggestions. Please!

I believe I did. Furthermore, they apparently were implemented (quietly,
without ever acknowledging the problem) for next QNX versions. From
unofficial sources, next version will use 64k blocks for I/O internally.
Current one does 128 x MsgWrite(512 bytes) and then does MsgReply(). Is that
so ‘damn good’?

Sounds like the problem is fixed. Let’s move on.

Now to the message passing. I do have constructive suggestions. Wait until
tomorrow, you should see a white paper on QNXZone.

Where is it? Please post when it appears.

Cheers,
-RK


Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training and Consulting at www.parse.com.
Email my initials at parse dot com.

Put up, or make constructive suggestions. Please!

This is a very dungerouse request believe me. This is one of the best way to
kill any
critique. Why the hell he has to tell QSSL how to implement their own code?!
Is he a leading designer of QNX? No, but he is the customer and IMHO one
of the purposes
of these news groups was that people can not only say how beautiful QNX is
but also
express their opnion if there is something wrong.
IMHO QNX has to pay people like IgorK because sometimes naming a problem is
50% of solving it.

Cheers,
Igor

“Igor Levko” <spama@nihrena.net> wrote in message
news:amseo4$v6$1@inn.qnx.com

Put up, or make constructive suggestions. Please!

This is a very dungerouse request believe me. This is one of the best way
to
kill any
critique. Why the hell he has to tell QSSL how to implement their own
code?!
Is he a leading designer of QNX? No, but he is the customer and IMHO one
of the purposes
of these news groups was that people can not only say how beautiful QNX is
but also
express their opnion if there is something wrong.
IMHO QNX has to pay people like IgorK because sometimes naming a problem
is
50% of solving it.

Cheers,
Igor

Interesting, does Igor share your IMHO ? :slight_smile:

// wbr

Chris McKillop wrote:

My opinion, constructive this time, is that there are certain things in the
OS that should be written in asm. I gained substantial speed improvements
just by writing my own asm memmove() function.

I’ve leave out all the other editorializing. Everyone knows how I feel
about the gnu compiler.



100% true. If you look at our memcpy it is done in asm. However, there are
huge improvements that can be made by fixing memmove() in C. For example,
calling memcpy() in the non-overlapping case and doing transfers of sizes
other then bytes. Right now the memmove() in libc always does byte-by-byte
copies, terribly slow. Doing the two mentioned changes gives improvements
of 10-15x (memcpy) and 5-6x (larger transfers).

One day while searching the net for my name (long story, not the point),
I came across something called Duff’s Device
(http://www.lysator.liu.se/c/duffs-device.html).

It is both interesting an applicable to this discussion. What’s even
better is it can be applied in ansi C, therefor avoiding dealing with
the low level asm routines.

Rick…


\

Rick Duff Internet: rick@astranetwork.com
Astra Network QUICS: rgduff
QNX Consulting and Custom Programming URL: http://www.astranetwork.com
+1 (204) 987-7475 Fax: +1 (204) 987-7479

Rick Duff <rick@astranetwork.com> wrote:

One day while searching the net for my name (long story, not the point),
I came across something called Duff’s Device
(> http://www.lysator.liu.se/c/duffs-device.html> ).

It is both interesting an applicable to this discussion. What’s even
better is it can be applied in ansi C, therefor avoiding dealing with
the low level asm routines.

Actually, several people involved in the ISO C committee agree that the
C Standard doesn’t contain a clear guarantee that this kind of code must
work:

http://groups.google.com/groups?dq=&hl=en&lr=&selm=R6SWtPzJMej7Ewtd%40romana.davros.org

(OTOH, since it does work with most (all?) C compilers, it’s still much
more portable than assembly…)


Wojtek Lerch QNX Software Systems Ltd.

“Robert Krten” <nospam83@parse.com> wrote in message
news:amrdak$6ve$1@inn.qnx.com

I’m not going to apologize for defending the QNX OS – it’s a damn good
OS.
Put up, or make constructive suggestions. Please!

-RK

It IS a fantastic OS. I’ve bitched too. You all know that. But I still
know that they’ve done most things right. And 6 IS better than 4.

I think that if QSSL could get a better compiler from ANYWHERE (and sorry,
no I don’t know where) it would improve many things many times over. Short
of that then maybe some profiling and hand generating certain key routines
in asm is the answer. I know this idea sucks because it means there is a
different source for each platform.

I agree. File system slowness is a LOCAL issue and has nothing to do with
networking.

I suspect that 1 reason is that there are several more layers in QNX IO then
there were in QNX4 to make it more generic. Sometimes that’s a good thing.
But I’m just not sure it is in QNX6 IO.

“Armin” <a-steinhoff@web.de> wrote in message
news:3D920F97.6090909@web.de

Robert Krten wrote:
Igor Kovalenko <> kovalenko@attbi.com> > wrote:

“Robert Krten” <> nospam83@parse.com> > wrote in message
news:amrdak$6ve$> 1@inn.qnx.com> …


[snip]



As I understood it, the thread was talking about copying lots of small
files across a network. This involves name resolution, which involves
lots of open()s with small messages.


The performance problems exist already for access of local files …
so no QNET sufficiencies are involved.

The block IO seems not be the problem … but the upperlevel routines/
modules.

(message passing? lots of small messages?, memcopy bytewise =:-/,
malloc? … )

Cheers

Armin
\

Hi Bill…

Bill Caroselli (Q-TPS) wrote:

I got almost 6x speed improvements over 6.01a by just coding memmove() in
asm. No MMX needed. The gcc compiler just doesn’t produce good code. I
think the real problem was that after each byte moved, it kept checking to
see if it won yet or not. I took that part out. ;~}

I wonder, is the same still true for 6.2.0, or was that part fixed with
something similar to your solution in the latest rtp version?


bests…

Miguel.

Hi Chris…

Chris McKillop wrote:

This is the kernel version. The kernel is the only thing I am running
from a 6.2.1 level system on this box actually. Neutrino adpoted the 6.x
numbering and dropped the 2.x numbering nearly 2 years ago.

Ah! Ok, I was not aware of that; now I know. Thanks. :slight_smile:

regards…

Miguel.

chris

Hi Jun…

Jun wrote:

Hi Miguel,

FYI.

And also take a look at what these guy said at:

http://www.control.com/1008253795/index_html#1008356524

Interesting. I like the guy that said:

’ Saying “what’s the best embedded OS?” is like saying “what’s the best
vehicle?” If you’re hauling lumber, get a pick-up. If you want speed you
need a sports car. ’

This is true. It is also true that if you make a choice and stick to it,
you will get very far as long as your consistent.

Just for the heck of it, I’ll tell you my personal choices (just another
reference point of choice):

  1. for embedded ‘multimedia’ applications: any platform that supports
    QNX is good. To me one of the strongest points with QNX is -aside from
    the memory protection, real-time context switch, etc- is the fact that
    you can develop drivers in user space. This I have not found any where
    else.

  2. for extensive i/o applications: MPC555 or MPC565. Phytec
    (http://www.phytec.com/HomeFrameset.html) has the phyCORE-MPC555
    platform which is just one of the bests. With this platform you can use
    CodeWarrior + Motorola tools found at

http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MPC555&nodeId=01M98648#tools

and you have just about every thing that you need. (QNX would not run
on the MPC5x5 family of microcontrollers).

  1. then if you combine something like the following:

MPC823 + MPC555

you have the best of both worlds. with the MPC823 you can use QNX, and
with this you can take care of any high speed, higher level of
abstraction applications that you may have (i.e. interface with the user
over ethernet). In turn, with the MPC555 you can take care of any
necessities that you may have with the sensor+actuator world (think
robotics applications). The two (823 <–> 555) will communicate very
efficiently via QSPI. As far as robotics goes, this makes for a base
line system which is very difficult to beat.

(I think that IBM + the-big-three use a similar combination in some
versions of their cars, but others better informed will comment on this.)

Of course, there are a multitude of other choices for hardware platform
and software + RTOS choices, but you know all about that by now. :slight_smile:

Regards…

Miguel.


Jun XU

Robert Krten wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote:

“Robert Krten” <> nospam83@parse.com> > wrote in message
news:amrdak$6ve$> 1@inn.qnx.com> …


[snip]



As I understood it, the thread was talking about copying lots of small
files across a network. This involves name resolution, which involves
lots of open()s with small messages.

The performance problems exist already for access of local files …
so no QNET sufficiencies are involved.

The block IO seems not be the problem … but the upperlevel routines/
modules.

(message passing? lots of small messages?, memcopy bytewise =:-/,
malloc? … )

Cheers

Armin

“Rick Duff” <rick@astranetwork.com> wrote in message
news:3D91D432.3050608@astranetwork.com

One day while searching the net for my name (long story, not the point),
I came across something called Duff’s Device
(> http://www.lysator.liu.se/c/duffs-device.html> ).

It is both interesting an applicable to this discussion. What’s even
better is it can be applied in ansi C, therefor avoiding dealing with
the low level asm routines.

It vaguely reminds me of the memcpy() optimization that danh came up with
back in QNX2 days. It would do 32-bit (or was it 16-bit back then)
assignments as much as possible, then the remainder if any. With MMX
instructions it could be 64-bits at a time now.

Bill Flowers
Insight Control Systems
Safety Harbor, FL