DOS&NT disks access faster than QNX?!

A customer is claiming the following transfer rates for
QNX/DOS/NT:
QNX 3.3 MB/s
NT 6.2 MB/s
DOS 7.9 MB/s

The program used was as shown below.
I have the following suggestion from QNX support, but I’m interested
in any other inputs to get the near platter speed promised by the
marketers in http://www.qnx.com/products/os/qnxrtos.html

  • use read/write instead of the streaming layer (fwrite…)
  • you can tune Fsys to match the data being read/written (-P) … is data
    being written and then not being read back for a while? then you can
    decrease the LRU cache.
  • increase the cache size as much as possible

Has anyone tried to get really fast disk access?

Any tip, ideas & trick would be appreciated.

=== DOS Code ========
#include <stdio.h>
#include <stdlib.h>
//#include <dos.h>
#include <sys/time.h>
#include <fstream.h>
#include <fcntl.h>
#include <unistd.h>

main()
{
FILE fp;
int fd;
char pBuffer;
struct timespec time_start, time_end;
double milliseconds;
int size = 32
1024;
int blocks = 128
1024/32;
int i;
int iCount;
char filename[256];
char chTemp[1024];
pBuffer = (char*)malloc(size);


for (iCount=0;iCount<2;iCount++) {
sprintf(filename,“test%d.dat”,iCount);
if ( (fp = fopen(filename,“wb”)) == NULL)
return 1;

//chsize(fd,size*blocks);
//seek(fd,0);

clock_gettime(CLOCK_REALTIME,&time_start);
for (i = 0; i < blocks; i++)
{
if (1 != fwrite(pBuffer,size,1,fp))
{
return 2;
}
}
clock_gettime(CLOCK_REALTIME,&time_end);

milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Writing:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
memset(chTemp,0,1024);
// sprintf(chTemp,“Transfer Rate: %7.3f MB/s\n”,
((size*blocks)/1024.0/1024.0) / (milliseconds /1000.0));

fclose(fp);
if ( (fp = fopen(filename,“rb”)) == NULL)
return 1;

clock_gettime(CLOCK_REALTIME,&time_start);
for (i = 0; i < blocks; i++)
{
if (1 != fread(pBuffer,size,1,fp))
{
return 2;
}
}
clock_gettime(CLOCK_REALTIME,&time_end);
fclose(fp);

milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Reading:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
}

return 0;
}


And for NT:
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
//#include <dos.h>
//#include <sys/time.h>
#include <fstream.h>
//#include <fcntl.h>
//#include <unistd.h>

main()
{
FILE fp;
int fd;
char pBuffer;
// struct timespec time_start, time_end;
double milliseconds;
int size = 32
1024;
int blocks = 128
1024/32;
int i;
int iCount;
char filename[256];
char chTemp[1024];
pBuffer = (char*)malloc(size);


for (iCount=0;iCount<2;iCount++) {
sprintf(filename,“test%d.dat”,iCount);
if ( (fp = fopen(filename,“wb”)) == NULL)
return 1;

//chsize(fd,size*blocks);
//seek(fd,0);

//clock_gettime(CLOCK_REALTIME,&time_start);
milliseconds = GetTickCount();
for (i = 0; i < blocks; i++)
{
if (1 != fwrite(pBuffer,size,1,fp))
{
return 2;
}
}
//clock_gettime(CLOCK_REALTIME,&time_end);
milliseconds = GetTickCount() -milliseconds;

//milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
//milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Writing:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
memset(chTemp,0,1024);
// sprintf(chTemp,“Transfer Rate: %7.3f MB/s\n”,
((size*blocks)/1024.0/1024.0) / (milliseconds /1000.0));

fclose(fp);
if ( (fp = fopen(filename,“rb”)) == NULL)
return 1;

milliseconds = GetTickCount();
//clock_gettime(CLOCK_REALTIME,&time_start);
for (i = 0; i < blocks; i++)
{
if (1 != fread(pBuffer,size,1,fp))
{
return 2;
}
}
//clock_gettime(CLOCK_REALTIME,&time_end);
fclose(fp);

milliseconds = GetTickCount()-milliseconds;

//milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
//milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Reading:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
}

return 0;
}

“Alex Cellarius” <acellarius@systems104.co.za> wrote in message
news:3A071B63.81C4A5FA@systems104.co.za

A customer is claiming the following transfer rates for
QNX/DOS/NT:
QNX 3.3 MB/s
NT 6.2 MB/s
DOS 7.9 MB/s

That’s with IDE I suppose, QNX4 doesn’t support DMA so
it’s at a major disavantage. Unless you use SCSI there is not
way QNX4 can get close to any OS using DMA.

Odd that DOS is faster then NT.

The program used was as shown below.
I have the following suggestion from QNX support, but I’m interested
in any other inputs to get the near platter speed promised by the
marketers in > http://www.qnx.com/products/os/qnxrtos.html

  • use read/write instead of the streaming layer (fwrite…)
  • you can tune Fsys to match the data being read/written (-P) … is data
    being written and then not being read back for a while? then you can
    decrease the LRU cache.
  • increase the cache size as much as possible

Has anyone tried to get really fast disk access?

Any tip, ideas & trick would be appreciated.

=== DOS Code ========
#include <stdio.h
#include <stdlib.h
file://#include <dos.h
#include <sys/time.h
#include <fstream.h
#include <fcntl.h
#include <unistd.h

main()
{
FILE fp;
int fd;
char pBuffer;
struct timespec time_start, time_end;
double milliseconds;
int size = 32
1024;
int blocks = 128
1024/32;
int i;
int iCount;
char filename[256];
char chTemp[1024];
pBuffer = (char*)malloc(size);


for (iCount=0;iCount<2;iCount++) {
sprintf(filename,“test%d.dat”,iCount);
if ( (fp = fopen(filename,“wb”)) == NULL)
return 1;

file://chsize(fd,size*blocks);
file://seek(fd,0);

clock_gettime(CLOCK_REALTIME,&time_start);
for (i = 0; i < blocks; i++)
{
if (1 != fwrite(pBuffer,size,1,fp))
{
return 2;
}
}
clock_gettime(CLOCK_REALTIME,&time_end);

milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Writing:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
memset(chTemp,0,1024);
// sprintf(chTemp,“Transfer Rate: %7.3f MB/s\n”,
((size*blocks)/1024.0/1024.0) / (milliseconds /1000.0));

fclose(fp);
if ( (fp = fopen(filename,“rb”)) == NULL)
return 1;

clock_gettime(CLOCK_REALTIME,&time_start);
for (i = 0; i < blocks; i++)
{
if (1 != fread(pBuffer,size,1,fp))
{
return 2;
}
}
clock_gettime(CLOCK_REALTIME,&time_end);
fclose(fp);

milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Reading:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
}

return 0;
}


And for NT:
#include <windows.h
#include <stdio.h
#include <stdlib.h
file://#include <dos.h
file://#include <sys/time.h
#include <fstream.h
file://#include <fcntl.h
file://#include <unistd.h

main()
{
FILE fp;
int fd;
char pBuffer;
// struct timespec time_start, time_end;
double milliseconds;
int size = 32
1024;
int blocks = 128
1024/32;
int i;
int iCount;
char filename[256];
char chTemp[1024];
pBuffer = (char*)malloc(size);


for (iCount=0;iCount<2;iCount++) {
sprintf(filename,“test%d.dat”,iCount);
if ( (fp = fopen(filename,“wb”)) == NULL)
return 1;

file://chsize(fd,size*blocks);
file://seek(fd,0);

file://clock_gettime(CLOCK_REALTIME,&time_start);
milliseconds = GetTickCount();
for (i = 0; i < blocks; i++)
{
if (1 != fwrite(pBuffer,size,1,fp))
{
return 2;
}
}
file://clock_gettime(CLOCK_REALTIME,&time_end);
milliseconds = GetTickCount() -milliseconds;

file://milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
file://milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Writing:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
memset(chTemp,0,1024);
// sprintf(chTemp,“Transfer Rate: %7.3f MB/s\n”,
((size*blocks)/1024.0/1024.0) / (milliseconds /1000.0));

fclose(fp);
if ( (fp = fopen(filename,“rb”)) == NULL)
return 1;

milliseconds = GetTickCount();
file://clock_gettime(CLOCK_REALTIME,&time_start);
for (i = 0; i < blocks; i++)
{
if (1 != fread(pBuffer,size,1,fp))
{
return 2;
}
}
file://clock_gettime(CLOCK_REALTIME,&time_end);
fclose(fp);

milliseconds = GetTickCount()-milliseconds;

file://milliseconds = (time_end.tv_sec - time_start.tv_sec) * 1000;
file://milliseconds += (time_end.tv_nsec - time_start.tv_nsec) / 1000000;
printf(“Reading:\n”);
printf(“Time: %d:%d\n”,(int)(milliseconds/1000.0),((int)milliseconds)
% 1000);
printf(“Data Size: %7.3fMB\n”,(sizeblocks)/1024.0/1024.0);
printf(“Transfer Rate: %7.3f MB/s\n”, ((size
blocks)/1024.0/1024.0) /
(milliseconds /1000.0));
}

return 0;
}

Does RTP support DMA?

Will QNX4 EIDE in the future?

And just for fun, does QNX4 aha7scsi support DMA?


Mario Charest <mcharest@zinformatic.com> wrote in message
news:8u7juq$s1s$1@inn.qnx.com

That’s with IDE I suppose, QNX4 doesn’t support DMA so
it’s at a major disavantage. Unless you use SCSI there is not
way QNX4 can get close to any OS using DMA.

“Bill at Sierra Design” <BC@SierraDesign.com> wrote in message
news:8u7lc5$t62$1@inn.qnx.com

Does RTP support DMA?

Will QNX4 EIDE in the future?

And just for fun, does QNX4 aha7scsi support DMA?


Mario Charest <> mcharest@zinformatic.com> > wrote in message
news:8u7juq$s1s$> 1@inn.qnx.com> …

That’s with IDE I suppose, QNX4 doesn’t support DMA so
it’s at a major disavantage. Unless you use SCSI there is not
way QNX4 can get close to any OS using DMA.

I’m fairly certain that aha7scsi supports only Adaptec PCI cards, all of
which have bus mastering DMA controllers on them. So yes, aha7scsi will
support DMA, of the best kind (none of that wimpy 64k boundary stuff!).

We looked at disk speeds a while back on QNX4, and determined that IDE was
faster than aha7scsi (2940UW card) for small, scattered reads & writes.
aha7scsi however kicked IDE’s butt for large contiguous block transfers, at
only a fraction of the CPU usage. I didn’t realize that QNX4 didn’t use DMA
for IDE, but that would make sense with what we saw. It seems that there’s
a bit more overhead getting the SCSI commands set up, but once you turn the
thing loose it’s pretty fast.

I’ve noticed that heavy disk I/O on RtP really chews up the CPU cycles (on
BOTH CPUs according to the little graph thingies). Is this perhaps because
I don’t have DMA properly enabled on my IDE driver? I suppose I could read
up on it tomorrow at the office (what, me actually study something? Naw…)

-Warren

“Warren Peece” <Warren@nospam.com> wrote in message
news:8u822f$bir$1@inn.qnx.com

“Bill at Sierra Design” <> BC@SierraDesign.com> > wrote in message
news:8u7lc5$t62$> 1@inn.qnx.com> …
Does RTP support DMA?

Will QNX4 EIDE in the future?

And just for fun, does QNX4 aha7scsi support DMA?


Mario Charest <> mcharest@zinformatic.com> > wrote in message
news:8u7juq$s1s$> 1@inn.qnx.com> …

That’s with IDE I suppose, QNX4 doesn’t support DMA so
it’s at a major disavantage. Unless you use SCSI there is not
way QNX4 can get close to any OS using DMA.

I’m fairly certain that aha7scsi supports only Adaptec PCI cards, all of
which have bus mastering DMA controllers on them. So yes, aha7scsi will
support DMA, of the best kind (none of that wimpy 64k boundary stuff!).

We looked at disk speeds a while back on QNX4, and determined that IDE was
faster than aha7scsi (2940UW card) for small, scattered reads & writes.
aha7scsi however kicked IDE’s butt for large contiguous block transfers,
at
only a fraction of the CPU usage. I didn’t realize that QNX4 didn’t use
DMA
for IDE, but that would make sense with what we saw. It seems that
there’s
a bit more overhead getting the SCSI commands set up, but once you turn
the
thing loose it’s pretty fast.

Depending how “a while back” that was, IDE interface weren’t as fast as
today
( pre ATA33 era ), so the CPU alone could keep up, but with todays IDE
it can’t keep up.

I’ve noticed that heavy disk I/O on RtP really chews up the CPU cycles (on
BOTH CPUs according to the little graph thingies).

The graph is wrong, have notice that both CPU are always use the same…


Is this perhaps because I don’t have DMA properly enabled on my IDE
driver?

Could be. On the same machine, with QNX4 I get 2M/sec with close to 100%
usage
and with RTP I get 5M/sec with 30% cpu usage.

I suppose I could read
up on it tomorrow at the office (what, me actually study something?
Naw…)

-Warren

Previously, Mario Charest wrote in qdn.public.qnx4:

Odd that DOS is faster then NT.

Not really. DOS has usually scored fastest when compared
with other OS’s. The reason why is simple. There’s no
possibility of contention, eliminating a lot of overhead.
When you ask DOS to read a bunch of sectors, it sets up
the hardware to copy them directly into your buffer.

I don’t know about NT, but I suspect that there is at least
one additional copy, like QNX.

Mitchell Schoenbrun --------- maschoen@pobox.com

Mitchell Schoenbrun wrote:

Previously, Mario Charest wrote in qdn.public.qnx4:

Odd that DOS is faster then NT.

Not really. DOS has usually scored fastest when compared
with other OS’s. The reason why is simple. There’s no
possibility of contention, eliminating a lot of overhead.
When you ask DOS to read a bunch of sectors, it sets up
the hardware to copy them directly into your buffer.

I don’t know about NT, but I suspect that there is at least
one additional copy, like QNX.

This is what the customer found:
“Considering we got 6.2 MB/s under NT (writing to FAT16)
and 7+ MB/s under dos, I find it hard to believe that
we can only achieve 4MB/s under QNX (only 3.1 MB/s if
we write to FAT16 instead of the QNX file system).”

The crucial difference is DMA-Fsys driver do not use DMA.
e.g. The transfer speed doubled to about 8MB/s using a 400MHz
processor (233 MHz was used achieving the results quoted),
whereas the other OS’s only showed about a 10% increase
with the faster CPU.