I/O operations are very slow in comparison with Linux

Dr_G_Geigemuller · June 7, 2002, 2:33pm

Hello

We are developing applications for high quality video streaming. And we
need high I/O performance for drivers and IPC. But our test shows that
simple I/O operations are very, very slow under QNX6.2. Under Linux the
same operations with device drivers are running 10 times faster and more.

What’s wrong? Is our approach to use mixed device drivers wrong?
Should we place all of our code in one process and not use I/O operations
(IPC) between device drivers and processes (filters)?

Who can explain this behaviour under QNX6.2 ? Please, have a look at
following
simple program:

Example:

/* hello.c */
#include <stdio.h>
#include <unistd.h>
#include <time.h>

int main (void)
{
int i;
time_t start;

start = time (NULL);

for (i = 0; i < 10000000L; i++)
write (STDOUT_FILENO, “Hello, world!”, 13);

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start);
return 0;
}

Result under Linux:

Linux on Pentium-III 1.3 GHz:

uname -a

Linux mops-i 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 4 sec.

real 0m3.224s
user 0m1.320s
sys 0m1.800s

Result under QNX6.2:

QNX6.2 on Pentium-4 1.6 GHz:

uname -a

QNX neutrino1 6.2.0 2002/02/16-03:25:28est x86pc x86

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 47 sec.
47.15s real 4.34s user 5.17s system

Same test with 40 KByte output buffer, instead of “Hello, world!”:

Result under Linux:

Time difference: 3 sec.

Result under QNX6.2:

Time difference: 56 sec.

Same test with output to filesystem:

time ./hello >/tmp/dummy

ls -l /tmp/dummy

-rw-r–r-- 1 root root 130000000 Jun 7 13:22 /tmp/dummy

Result under Linux:

Time difference: 8 sec.

Result under QNX6.2:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system)

Same test with output to ring buffer as device driver “emlog”
(1 MByte capacity):

time ./hello >/dev/emlog

Result under Linux:

Time difference: 7 sec.

Result under QNX6.2:

Time difference: 107 sec.

Thanks for hints to solve the problem.

Regards
Dr. G. Geigemüller

BitCtrl Systems GmbH
Weißenfelser Str. 67
04229 Leipzig, Germany

E-Mail: info@bitctrl.de
Internet: www.bitctrl.de, www.bitctrl.com
Tel: +49 341-490670
Fax: +49 341-4906715

John_Garvey1 · June 8, 2002, 3:05am

Dr. G. Geigemuller" <Gunter.Geigemueller@bitctrl.de> wrote:

We are developing applications for high quality video streaming. And we

for (i = 0; i < 10000000L; i++)
write (STDOUT_FILENO, “Hello, world!”, 13);

This test seems inconsistent/inappropriate for the intended usage … I
doubt you would stream video in 13-byte chunks … ?! In general you’d
want to minimise the context switching and message passing overheads by
exchanging larger units of data wherever possible (say 4k-16k).

Mike_Gorchak1 · June 8, 2002, 10:08am

We are developing applications for high quality video streaming. And we
need high I/O performance for drivers and IPC. But our test shows that
simple I/O operations are very, very slow under QNX6.2. Under Linux the
same operations with device drivers are running 10 times faster and more.

It is wrong test - you testing console or pterm speed, but not real i/o
operations, use /dev/null to get real results. Linux console (output) uses
some caching of data, e.g. if you writing 100Kb per one write call you get
only last kilobytes, appeared on the screen, because linux console driver
has some intelligence.

Dr_Gunter_Geigemulle · June 8, 2002, 7:35pm

Hello John,

of course, you are right; - video streaming with “Hello world!” strings is
not very useful, .
But we got the same results with 40 kByte chunks, please refer to my last
message.
I agree that very small chunks are not applicable for video streaming. Very
large chunks
(> 250 kByte = ca. 500 msec delay = 12 frames in case of 4 MBit/s MPEG-2 PAL
streaming)
are also not very useful for realtime (and interactive) video streaming.
This point is not the question.

I only want to know about the big performance difference for I/O operations
between QNX6.2
and Linux !!

If you have some spare time, please, compile the example and check it out.

Thank you very much for your help.

Regards
Dr. G. Geigemüller
BitCtrl Systems
http://www.bitctrl.com http://www.bitctrl.de

“John Garvey” <jgarvey@qnx.com> schrieb im Newsbeitrag
news:adrs9a$d1n$1@nntp.qnx.com…

Dr. G. Geigemuller" <> Gunter.Geigemueller@bitctrl.de> > wrote:
We are developing applications for high quality video streaming. And we

for (i = 0; i < 10000000L; i++)
write (STDOUT_FILENO, “Hello, world!”, 13);

This test seems inconsistent/inappropriate for the intended usage … I
doubt you would stream video in 13-byte chunks … ?! In general you’d
want to minimise the context switching and message passing overheads by
exchanging larger units of data wherever possible (say 4k-16k).

Dr_Gunter_Geigemulle · June 8, 2002, 8:15pm

Hello Mike,

we had checked the example under different conditions without console
outputs:

time ./hello >/dev/null

time ./hello >/tmp/dummy

time ./hello >/dev/emlog # ring buffer device driver

And we got the same results.

We don’t measure the “console or pterm speed”. Please, refer to the source
of the
example.

“Mike Gorchak” <mike@malva.ua.remove.this.for.no.spam> schrieb im
Newsbeitrag news:adskal$psc$1@inn.qnx.com…

We are developing applications for high quality video streaming. And we
need high I/O performance for drivers and IPC. But our test shows that
simple I/O operations are very, very slow under QNX6.2. Under Linux the
same operations with device drivers are running 10 times faster and
more.

It is wrong test - you testing console or pterm speed, but not real i/o
operations, use /dev/null to get real results. Linux console (output) uses
some caching of data, e.g. if you writing 100Kb per one write call you get
only last kilobytes, appeared on the screen, because linux console driver
has some intelligence.

If you have some spare time, please, compile the example and check it out.
Or better,
give us an example to measure the “real” I/O performance under QNX6.2 and
Linux. !!

Regards
Dr. G. Geigemüller
BitCtrl Systems
http://www.bitctrl.com http://www.bitctrl.de

John_Garvey1 · June 9, 2002, 3:10am

Dr. Gunter Geigemuller" <Gunter@geigemueller.de> wrote:

But we got the same results with 40 kByte chunks, please refer to my last
message.

Running your test with 40kB chunks took 9.3 seconds (on my PIII-400/UDMA2)
to a local fs-qnx4 disk (certainly far from the >100s you claim) … are
you sure your “/tmp” filesystem is on a local properly-configured hard disk?

Andrzej_Kocon1 · June 9, 2002, 11:49am

On Fri, 7 Jun 2002 16:33:44 +0200, “Dr. G. Geigemüller”
<Gunter.Geigemueller@bitctrl.de> wrote:

It’s buffering (setbuffer()), cacheing, or write() itself. Did
you measure cp or cat utilities on big files under QNX 6.2? They
should be much faster than the sample program, and the source code
might be available from the CVS repository.

Hello

We are developing applications for high quality video streaming. And we
need high I/O performance for drivers and IPC. But our test shows that
simple I/O operations are very, very slow under QNX6.2. Under Linux the
same operations with device drivers are running 10 times faster and more.

(…)

Andrzej_Kocon1 · June 9, 2002, 3:13pm

On Sun, 09 Jun 2002 11:49:06 GMT, ako@box43.gnet.pl (Andrzej Kocon)
wrote:

[ slow io with write() ]

The writeblock() function seems to be faster than write() when
the number of blocks written at once is >> 1 (especially when writing
to /dev/null…). On the other hand, pregrowing the output file and/or
using setbuffer() does not seem to improve things significantly.

ako

Bernard_Leclerc1 · June 10, 2002, 3:03am

Dr Geigemüller,

I ran your test (hello.c) on my PC and I obtained similar results. Here
are the details:

Result under QNX6.1.0 Patch A – Pentium-4 1.6 GHz

uname -a

QNX localhost 6.1.0 2001/08/23-19:38:50edt x86pc x86

gcc -O2 -march=pentium hello.c -o hello

time hello > /dev/null

Time difference: 48 sec.
47.37s real 2.75s user 4.96s system

time hello > /tmp/dummy

Time difference: 220 sec.
220.28s real 6.00s user 15.68s system

Conclusion: QNX 6.2 is just slightly better than 6.1

I also ran these two tests with spin (similar to top on Linux) running
at the same time. Here is the CPU usage for these tests:

time hello > /tmp/dummy

devb-eide 80%
hello 9%
procnto 9%

hello > /dev/null

hello 20%
procnto 80%

procnto is the QNX micro-kernel. devb-eide is the block device driver
for eide disk.

I have no explanation to offer as to why is takes so long on QNX.

Bernard Leclerc

P.S. I would try to rewrite the test to avoid redirecting stdout. Maybe
QNX I/O redirection is not as good as on Linux.

Dr_G_Geigemuller · June 12, 2002, 9:58am

Hello guys

Thanks you for the hints. We have rewritten our test to get
real results for I/O operations. Please check our conclusions.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>

char buf[10240]; //

int main (int argc, char *argv[])
{
int i, fd = STDOUT_FILENO;
time_t start;

// case A: argc > 1, means direct output to file
// case B: argc =1, means output to stdout

if (argc > 1)
if ((fd = open(argv[1], O_WRONLY | O_CREAT)) < 0){
perror(“open()”);
exit(EXIT_FAILURE);
}

start = time (NULL);

for (i = 0; i < 50000; i++)
write (fd, buf, 10240);

fprintf (stderr, “Time difference: %lu sec.\n”, time (NULL) - start);
return 0;
}

Results for the new example with file output

./hello /tmp/dummy # case A (whithout redirection)

./hello >/tmp/dummy # case B (with redirection)

Case A: Linux - 88 sec
QNX6.2 - 18 sec

Case B: Linux - 18 sec
QNX6.2 - 28 sec

Under QNX6.2 case A is appr. 50% faster than case B
Under Linux case A is appr. 5,5 times slower than case B

2. /dev/null

It seems that under Linux /dev/null has a special implementation. So our
test didn’t show the real performance for I/O operations.

3. /dev/emlog

What is emlog?
emlog (so called “EMbedded-system LOG-device”) is a Linux kernel module,
which we have ported to QNX6.2 as device driver,
please, look at http://www.circlemud.org/~jelson/software/emlog/

Using the example mentioned above with /dev/emlog instead of /tmp/dummy
we have got similar performance for QNX6.2 and Linux

Regards
Dr. G. Geigemüller

BitCtrl Systems GmbH
Weißenfelser Str. 67
04229 Leipzig, Germany

E-Mail: info@bitctrl.de
Internet: www.bitctrl.de, www.bitctrl.com
Tel: +49 341-490670
Fax: +49 341-4906715

“Dr. G. Geigemüller” <Gunter.Geigemueller@bitctrl.de> schrieb im Newsbeitrag
news:adqffl$6sl$1@inn.qnx.com…

Hello

We are developing applications for high quality video streaming. And we
need high I/O performance for drivers and IPC. But our test shows that
simple I/O operations are very, very slow under QNX6.2. Under Linux the
same operations with device drivers are running 10 times faster and more.

What’s wrong? Is our approach to use mixed device drivers wrong?
Should we place all of our code in one process and not use I/O operations
(IPC) between device drivers and processes (filters)?

Who can explain this behaviour under QNX6.2 ? Please, have a look at
following
simple program:

Example:

/* hello.c */
#include <stdio.h
#include <unistd.h
#include <time.h

int main (void)
{
int i;
time_t start;

start = time (NULL);

for (i = 0; i < 10000000L; i++)
write (STDOUT_FILENO, “Hello, world!”, 13);

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start);
return 0;
}

Result under Linux:

Linux on Pentium-III 1.3 GHz:

uname -a

Linux mops-i 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 4 sec.

real 0m3.224s
user 0m1.320s
sys 0m1.800s

Result under QNX6.2:

QNX6.2 on Pentium-4 1.6 GHz:

uname -a

QNX neutrino1 6.2.0 2002/02/16-03:25:28est x86pc x86

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 47 sec.
47.15s real 4.34s user 5.17s system

Same test with 40 KByte output buffer, instead of “Hello, world!”:

Result under Linux:

Time difference: 3 sec.

Result under QNX6.2:

Time difference: 56 sec.

Same test with output to filesystem:

time ./hello >/tmp/dummy

ls -l /tmp/dummy

-rw-r–r-- 1 root root 130000000 Jun 7 13:22 /tmp/dummy

Result under Linux:

Time difference: 8 sec.

Result under QNX6.2:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system)

Same test with output to ring buffer as device driver “emlog”
(1 MByte capacity):

time ./hello >/dev/emlog

Result under Linux:

Time difference: 7 sec.

Result under QNX6.2:

Time difference: 107 sec.

Thanks for hints to solve the problem.

Regards
Dr. G. Geigemüller

BitCtrl Systems GmbH
Weißenfelser Str. 67
04229 Leipzig, Germany

E-Mail: > info@bitctrl.de
Internet: > www.bitctrl.de> , > www.bitctrl.com
Tel: +49 341-490670
Fax: +49 341-4906715

John_Garvey1 · June 12, 2002, 11:24am

Dr. G. Geigemuller" <Gunter.Geigemueller@bitctrl.de> wrote:

Thanks you for the hints. We have rewritten our test to get
real results for I/O operations. Please check our conclusions.

These results look better and more consistent

./hello /tmp/dummy # case A (whithout redirection)

./hello >/tmp/dummy # case B (with redirection)

File redirection should not matter as such, it is just who does the
open(), the shell or the test program (there may be a slight but
insignificant dup() overhead with the former). Where the difference
probably comes in is from O_TRUNC (which the shell will use for “>”
redirection but you didn’t specify in your open()). So, after
running case A you have created a 50MB file which is first deleted
and then incrementally grown again by case B. This is the difference.
You should be able to illustrate this by doing “rm /tmp/dummy;
…/hello /tmp/dummy; ./hello /tmp/dummy”, i.e. run case A twice with
and without the file existing/pregrown … the difference between
the first and second iteration is the cost of the deletion/growth.

This is why it is often a performance win to pregrow data files
beforehand so the overhead does not occur during the actual
collection/streaming/recording; in cases when you know in advance
the final size of the output file and intend to write it immediately,
you can use the (non-POSIX) devctl(DCMD_FSYS_PREGROW_FILE) to make
logical block allocation from the filesystem without zero-filling
(this is a somewhat grubby trick but this and others were intended
specifically for multimedia streaming); or if you know you will be
growing a large number of files in normal usage, a more aggressive
block allocation strategy can be enabled with “qnx4 overalloc”.

Pavol_Kycina1 · June 12, 2002, 11:31am

Is your port of emlog freely available? Looks interesting.

Thanks, Pavol Kycina

“Dr. G. Geigemüller” <Gunter.Geigemueller@bitctrl.de> wrote in message
news:ae7575$ouu$1@inn.qnx.com…

Hello guys

Thanks you for the hints. We have rewritten our test to get
real results for I/O operations. Please check our conclusions.

#include <stdio.h
#include <unistd.h
#include <stdlib.h
#include <fcntl.h
#include <time.h

char buf[10240]; //

int main (int argc, char *argv[])
{
int i, fd = STDOUT_FILENO;
time_t start;

// case A: argc > 1, means direct output to file
// case B: argc =1, means output to stdout

if (argc > 1)
if ((fd = open(argv[1], O_WRONLY | O_CREAT)) < 0){
perror(“open()”);
exit(EXIT_FAILURE);
}

start = time (NULL);

for (i = 0; i < 50000; i++)
write (fd, buf, 10240);

fprintf (stderr, “Time difference: %lu sec.\n”, time (NULL) - start);
return 0;
}

Results for the new example with file output

./hello /tmp/dummy # case A (whithout redirection)

./hello >/tmp/dummy # case B (with redirection)

Case A: Linux - 88 sec
QNX6.2 - 18 sec

Case B: Linux - 18 sec
QNX6.2 - 28 sec

Under QNX6.2 case A is appr. 50% faster than case B
Under Linux case A is appr. 5,5 times slower than case B

2. /dev/null

It seems that under Linux /dev/null has a special implementation. So
our
test didn’t show the real performance for I/O operations.

3. /dev/emlog

What is emlog?
emlog (so called “EMbedded-system LOG-device”) is a Linux kernel
module,
which we have ported to QNX6.2 as device driver,
please, look at > http://www.circlemud.org/~jelson/software/emlog/

Using the example mentioned above with /dev/emlog instead of /tmp/dummy
we have got similar performance for QNX6.2 and Linux

Regards
Dr. G. Geigemüller

BitCtrl Systems GmbH
Weißenfelser Str. 67
04229 Leipzig, Germany

E-Mail: > info@bitctrl.de
Internet: > www.bitctrl.de> , > www.bitctrl.com
Tel: +49 341-490670
Fax: +49 341-4906715

“Dr. G. Geigemüller” <> Gunter.Geigemueller@bitctrl.de> > schrieb im
Newsbeitrag
news:adqffl$6sl$> 1@inn.qnx.com> …
Hello

We are developing applications for high quality video streaming. And we
need high I/O performance for drivers and IPC. But our test shows that
simple I/O operations are very, very slow under QNX6.2. Under Linux the
same operations with device drivers are running 10 times faster and
more.

What’s wrong? Is our approach to use mixed device drivers wrong?
Should we place all of our code in one process and not use I/O
operations
(IPC) between device drivers and processes (filters)?

Who can explain this behaviour under QNX6.2 ? Please, have a look at
following
simple program:

Example:

/* hello.c */
#include <stdio.h
#include <unistd.h
#include <time.h

int main (void)
{
int i;
time_t start;

start = time (NULL);

for (i = 0; i < 10000000L; i++)
write (STDOUT_FILENO, “Hello, world!”, 13);

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start);
return 0;
}

Result under Linux:

Linux on Pentium-III 1.3 GHz:

uname -a

Linux mops-i 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 4 sec.

real 0m3.224s
user 0m1.320s
sys 0m1.800s

Result under QNX6.2:

QNX6.2 on Pentium-4 1.6 GHz:

uname -a

QNX neutrino1 6.2.0 2002/02/16-03:25:28est x86pc x86

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 47 sec.
47.15s real 4.34s user 5.17s system

Same test with 40 KByte output buffer, instead of “Hello, world!”:

Result under Linux:

Time difference: 3 sec.

Result under QNX6.2:

Time difference: 56 sec.

Same test with output to filesystem:

time ./hello >/tmp/dummy

ls -l /tmp/dummy

-rw-r–r-- 1 root root 130000000 Jun 7 13:22 /tmp/dummy

Result under Linux:

Time difference: 8 sec.

Result under QNX6.2:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system)

Same test with output to ring buffer as device driver “emlog”
(1 MByte capacity):

time ./hello >/dev/emlog

Result under Linux:

Time difference: 7 sec.

Result under QNX6.2:

Time difference: 107 sec.

Thanks for hints to solve the problem.

Regards
Dr. G. Geigemüller

BitCtrl Systems GmbH
Weißenfelser Str. 67
04229 Leipzig, Germany

E-Mail: > info@bitctrl.de
Internet: > www.bitctrl.de> , > www.bitctrl.com
Tel: +49 341-490670
Fax: +49 341-4906715
\

Andrzej_Kocon1 · June 12, 2002, 8:28pm

On 12 Jun 2002 11:24:06 GMT, John Garvey <jgarvey@qnx.com> wrote:

Dr. G. Geigemuller" <> Gunter.Geigemueller@bitctrl.de> > wrote:
Thanks you for the hints. We have rewritten our test to get
real results for I/O operations. Please check our conclusions.

These results look better and more consistent >

./hello /tmp/dummy # case A (whithout redirection)

./hello >/tmp/dummy # case B (with redirection)

File redirection should not matter as such, it is just who does the
open(), the shell or the test program (there may be a slight but
insignificant dup() overhead with the former). Where the difference
probably comes in is from O_TRUNC (which the shell will use for “>”
redirection but you didn’t specify in your open()). So, after
running case A you have created a 50MB file which is first deleted
and then incrementally grown again by case B. This is the difference.

That would imply that the deletion of a 500 (!) MB file takes
10 seconds, because except for the deletion, the situation is similar
to running the case A for the first time (the file is created and then
grown incrementally). Or the disk fragmentation between case A and
case B (or because of the deletion) changed so badly, that the new
file, being quite large, is created in zillion extents (that’s another
reason for pregrowing files).

(However, the explanation could work for the Linux case: maybe
it does not truncate stdout until after closing it.)

The doc says that the files are block buffered by default, but
is it the case of stdout (isn’t it line buffered instead)? Also, I’ve
noticed that the write() is slow even if the number of bytes to write
is 0, and that writeblock(10, 1024) is faster than write(10240), as
expected (the values are example ones).

ako

You should be able to illustrate this by doing “rm /tmp/dummy;
./hello /tmp/dummy; ./hello /tmp/dummy”, i.e. run case A twice with
and without the file existing/pregrown … the difference between
the first and second iteration is the cost of the deletion/growth.

This is why it is often a performance win to pregrow data files
beforehand so the overhead does not occur during the actual
collection/streaming/recording; in cases when you know in advance
the final size of the output file and intend to write it immediately,
you can use the (non-POSIX) devctl(DCMD_FSYS_PREGROW_FILE) to make
logical block allocation from the filesystem without zero-filling
(this is a somewhat grubby trick but this and others were intended
specifically for multimedia streaming); or if you know you will be
growing a large number of files in normal usage, a more aggressive
block allocation strategy can be enabled with “qnx4 overalloc”.

Andrzej_Kocon1 · June 13, 2002, 9:32am

On 13 Jun 2002 22:42:57 GMT, John Garvey <jgarvey@qnx.com> wrote:

and that writeblock(10, 1024) is faster than write(10240), as expected

Sorry, actually I meant writeblock(10, 1024) is faster that
10*write(1024), which is trivial, but convenient if you can collect
several frames in memory to write out at once.

ako

John_Garvey1 · June 13, 2002, 10:42pm

Andrzej Kocon <ako@box43.gnet.pl> wrote:

File redirection should not matter as such, it is just who does the
open(), the shell or the test program (there may be a slight but
insignificant dup() overhead with the former). Where the difference
probably comes in is from O_TRUNC (which the shell will use for “>”
redirection but you didn’t specify in your open()). So, after
running case A you have created a 50MB file which is first deleted
and then incrementally grown again by case B. This is the difference.
That would imply that the deletion of a 500 (!) MB file takes
10 seconds, because except for the deletion, the situation is similar
to running the case A for the first time (the file is created and then
grown incrementally). Or the disk fragmentation between case A and

Deletion, especially of a fragmented file with many extents, could take
some time (each extent is erased from the bitmap). But not 10 secs, sure.
I was also assuming that in the testing/benchmarking phase a file had
been left around from a previous test (and hence didn’t need to be
incrementally grown - seeks back to the bitmap - either). This is why I
suggested “rm”-ing the file first and doing two runs of each test, to
attempt and quantify this difference.

The doc says that the files are block buffered by default, but
is it the case of stdout (isn’t it line buffered instead)?

write() doesn’t use stdout, it uses STDOUT_FILENO, so will be unbuffered.

Also, I’ve noticed that the write() is slow even if the number of bytes
to write is 0

A write(0) will still message-pass/context-switch to the IO manager, in
case it wants to do something with that request (the filesystems will
do EBADF verification and some other work with the request; some other
IO manager may interpret this to mean something like a packet boundary);
so yes a write(0) incurs a lot of the overhead of a write(>0).

and that writeblock(10, 1024) is faster than write(10240), as expected

Actually this is not expected to me. writeblock() is a combined seek and
write message and just multiplies the two numbers together to get nbytes
for the write component, so is actually heavier than a write (for
sequential/non-random access, it wins for random access by virtue of
combining the seek with the write); when sequential the filesystem has to
process an unnecessary seek (although the processing time for this is
neglible). In both case the IO manager will just see an _IO_WRITE of
10240 bytes. Both are a 2-part MsgSend() in the client. So I’d expect
identical performance (because of the near-identical messaging structure)
for sequential writing.

David_Gibbs1 · June 14, 2002, 2:45pm

Andrzej Kocon <ako@box43.gnet.pl> wrote:

On 13 Jun 2002 22:42:57 GMT, John Garvey <> jgarvey@qnx.com> > wrote:

and that writeblock(10, 1024) is faster than write(10240), as expected

Sorry, actually I meant writeblock(10, 1024) is faster that
10*write(1024), which is trivial, but convenient if you can collect
several frames in memory to write out at once.

It make sense that writeblock(10,1024) is faster than 10*write(1024).

You would expect that write(10240) would be faster than 10*write(1024),
and writeblock(10.1024) is essentially write(10240).

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

I/O operations are very slow in comparison with Linux

Example:

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start); return 0; } Result under Linux:

uname -a

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

real 0m3.224s user 0m1.320s sys 0m1.800s Result under QNX6.2:

uname -a

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 47 sec. 47.15s real 4.34s user 5.17s system Same test with 40 KByte output buffer, instead of “Hello, world!”:

Result under Linux:

Result under QNX6.2:

Time difference: 56 sec. Same test with output to filesystem:

time ./hello >/tmp/dummy

ls -l /tmp/dummy

Result under Linux:

Result under QNX6.2:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system) Same test with output to ring buffer as device driver “emlog” (1 MByte capacity):

time ./hello >/dev/emlog

Result under Linux:

Result under QNX6.2:

Regards Dr. G. Geigemüller

time ./hello >/dev/null

time ./hello >/tmp/dummy

time ./hello >/dev/emlog # ring buffer device driver

uname -a

gcc -O2 -march=pentium hello.c -o hello

time hello > /dev/null

time hello > /tmp/dummy

time hello > /tmp/dummy

hello > /dev/null

./hello /tmp/dummy # case A (whithout redirection)

./hello >/tmp/dummy # case B (with redirection)

Under QNX6.2 case A is appr. 50% faster than case B Under Linux case A is appr. 5,5 times slower than case B 2. /dev/null

It seems that under Linux /dev/null has a special implementation. So our test didn’t show the real performance for I/O operations. 3. /dev/emlog

Using the example mentioned above with /dev/emlog instead of /tmp/dummy we have got similar performance for QNX6.2 and Linux Regards Dr. G. Geigemüller

Example:

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start); return 0; } Result under Linux:

uname -a

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

real 0m3.224s user 0m1.320s sys 0m1.800s Result under QNX6.2:

uname -a

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 47 sec. 47.15s real 4.34s user 5.17s system Same test with 40 KByte output buffer, instead of “Hello, world!”:

Result under Linux:

Result under QNX6.2:

Time difference: 56 sec. Same test with output to filesystem:

time ./hello >/tmp/dummy

ls -l /tmp/dummy

Result under Linux:

Result under QNX6.2:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system) Same test with output to ring buffer as device driver “emlog” (1 MByte capacity):

time ./hello >/dev/emlog

Result under Linux:

Result under QNX6.2:

Regards Dr. G. Geigemüller

./hello /tmp/dummy # case A (whithout redirection)

./hello >/tmp/dummy # case B (with redirection)

./hello /tmp/dummy # case A (whithout redirection)

./hello >/tmp/dummy # case B (with redirection)

Under QNX6.2 case A is appr. 50% faster than case B Under Linux case A is appr. 5,5 times slower than case B 2. /dev/null

It seems that under Linux /dev/null has a special implementation. So our test didn’t show the real performance for I/O operations. 3. /dev/emlog

Using the example mentioned above with /dev/emlog instead of /tmp/dummy we have got similar performance for QNX6.2 and Linux Regards Dr. G. Geigemüller

Example:

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start); return 0; } Result under Linux:

uname -a

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

real 0m3.224s user 0m1.320s sys 0m1.800s Result under QNX6.2:

uname -a

gcc -O2 -march=pentium hello.c -o hello

time ./hello >/dev/null

Time difference: 47 sec. 47.15s real 4.34s user 5.17s system Same test with 40 KByte output buffer, instead of “Hello, world!”:

Result under Linux:

Result under QNX6.2:

Time difference: 56 sec. Same test with output to filesystem:

time ./hello >/tmp/dummy

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start);
return 0;
}

Result under Linux:

real 0m3.224s
user 0m1.320s
sys 0m1.800s

Result under QNX6.2:

Time difference: 47 sec.
47.15s real 4.34s user 5.17s system

Same test with 40 KByte output buffer, instead of “Hello, world!”:

Time difference: 56 sec.

Same test with output to filesystem:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system)

Same test with output to ring buffer as device driver “emlog”
(1 MByte capacity):

Regards
Dr. G. Geigemüller

Under QNX6.2 case A is appr. 50% faster than case B
Under Linux case A is appr. 5,5 times slower than case B

2. /dev/null

It seems that under Linux /dev/null has a special implementation. So our
test didn’t show the real performance for I/O operations.

3. /dev/emlog

Using the example mentioned above with /dev/emlog instead of /tmp/dummy
we have got similar performance for QNX6.2 and Linux

Regards
Dr. G. Geigemüller

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start);
return 0;
}

Result under Linux:

real 0m3.224s
user 0m1.320s
sys 0m1.800s

Result under QNX6.2:

Time difference: 47 sec.
47.15s real 4.34s user 5.17s system

Same test with 40 KByte output buffer, instead of “Hello, world!”:

Time difference: 56 sec.

Same test with output to filesystem:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system)

Same test with output to ring buffer as device driver “emlog”
(1 MByte capacity):

Regards
Dr. G. Geigemüller

Under QNX6.2 case A is appr. 50% faster than case B
Under Linux case A is appr. 5,5 times slower than case B

2. /dev/null

It seems that under Linux /dev/null has a special implementation. So
our
test didn’t show the real performance for I/O operations.

3. /dev/emlog

Using the example mentioned above with /dev/emlog instead of /tmp/dummy
we have got similar performance for QNX6.2 and Linux

Regards
Dr. G. Geigemüller

fprintf (stderr, “Time difference: %d sec.\n”, time (NULL) - start);
return 0;
}

Result under Linux:

real 0m3.224s
user 0m1.320s
sys 0m1.800s

Result under QNX6.2:

Time difference: 47 sec.
47.15s real 4.34s user 5.17s system

Same test with 40 KByte output buffer, instead of “Hello, world!”:

Time difference: 56 sec.

Same test with output to filesystem:

Time difference: 170 sec. (170.40s real 6.06s user 16.46s system)

Same test with output to ring buffer as device driver “emlog”
(1 MByte capacity):

Regards
Dr. G. Geigemüller