Performances drop of processes launched over qnet

Following code (test.c) may be used as test to reproduce a strange
behavior executing process accessing large data tables, on nodes that
don’t have a file system (through qnet).

We use a q-net between two nodes:

  • fist one (mpu) has a file system
  • second one (ppu1) not has

When I run this program on ppu1 i have this results:

/home/user>time on -f ppu1 ./test 1000
dt=330569543 for n=1000 cycles (s=0)
dt=116352 for n=1000 cycles (s=0)
0.24s real 0.00s user 0.00s system

/home/user>time on -f ppu1 ./test 1000
dt=13773725819 for n=1000 cycles (s=0)
dt=107064 for n=1000 cycles (s=0)
8.66s real 0.01s user 0.00s system

/home/user>time on -f ppu1 ./test 1000
dt=13792264034 for n=1000 cycles (s=0)
dt=109726 for n=1000 cycles (s=0)
8.68s real 0.00s user 0.00s system

/home/user>pidin -n ppu1 info
CPU:X86 Release:6.3.2 FreeMem:244Mb/511Mb BootTime:Jun 16 13:09:47
Processor1: 686 Intel 686 F6M9S5 1603MHz FPU


On a similar CPU but without qnet (filesystem with executables on IDE
hd) I obtain:

/home/user>time ./test 1000
dt=34749236 for n=1000 cycles (s=0)
dt=159102 for n=1000 cycles (s=0)
0.02s real 0.00s user 0.00s system

/home/user>time ./test 1000
dt=340074 for n=1000 cycles (s=0)
dt=15753 for n=1000 cycles (s=0)
0.00s real 0.00s user 0.00s system

/home/user>time ./test 1000
dt=257764 for n=1000 cycles (s=0)
dt=15087 for n=1000 cycles (s=0)
0.00s real 0.01s user 0.00s system

/home/user>pidin info
CPU:X86 Release:6.3.2 FreeMem:1469Mb/2047Mb BootTime:Jul 17 13:11:06
Processor1: 686 Intel 686 F6M13S8 1992MHz FPU


We have noted these differences only between nodes with file system and
nodes who use a remote file system throught qnet!

Could you explain what happens?

Thanks!
Davide


– test.c ------------------------------------------------------

#include <stdio.h>
#include <sys/neutrino.h>
#include <inttypes.h>

static int Tab[1<<20] = { 10,11,12,3 };

int main(int argc, char** argv)
{
uint64_t t1,t2;
unsigned int s,n,i,j;

if (argc!=2) {
printf(“missing number of cycles parameter\n”);
return 1;
}

/* number of search in Tab */
n=atoi(argv[1]);

/* First loop /
/
pseudo random search in Tab - 1437041 is a prime number */
s=j=0;
t1 = ClockCycles();
for (i=0;i<n;i++)
s+=Tab[(j+=1437041)&((1<<20)-1)];
t2 = ClockCycles();

printf(“dt=%llu for n=%d cycles (s=%x)\n”,t2-t1,n,s);

/* Second loop /
/
pseudo random search in Tab - 1437041 is a prime number */
s=j=0;
t1 = ClockCycles();
for (i=0;i<n;i++)
s+=Tab[(j+=1437041)&((1<<20)-1)];
t2 = ClockCycles();

printf(“dt=%llu for n=%d cycles (s=%x)\n”,t2-t1,n,s);

return 0;
}


/* Ancri Davide - */

I suspect it may be to do with paging in the Tab from disk. Since it is an initialized array, it will be in
the data segment.

Try switching it to bss, and initilize it at runtime, and repost the results…

Of course there is other stuff going on though…

Davide Ancri wrote:

Following code (test.c) may be used as test to reproduce a strange
behavior executing process accessing large data tables, on nodes that
don’t have a file system (through qnet).

We use a q-net between two nodes:

  • fist one (mpu) has a file system
  • second one (ppu1) not has

When I run this program on ppu1 i have this results:

/home/user>time on -f ppu1 ./test 1000
dt=330569543 for n=1000 cycles (s=0)
dt=116352 for n=1000 cycles (s=0)
0.24s real 0.00s user 0.00s system

/home/user>time on -f ppu1 ./test 1000
dt=13773725819 for n=1000 cycles (s=0)
dt=107064 for n=1000 cycles (s=0)
8.66s real 0.01s user 0.00s system

/home/user>time on -f ppu1 ./test 1000
dt=13792264034 for n=1000 cycles (s=0)
dt=109726 for n=1000 cycles (s=0)
8.68s real 0.00s user 0.00s system

/home/user>pidin -n ppu1 info
CPU:X86 Release:6.3.2 FreeMem:244Mb/511Mb BootTime:Jun 16 13:09:47
Processor1: 686 Intel 686 F6M9S5 1603MHz FPU


On a similar CPU but without qnet (filesystem with executables on IDE
hd) I obtain:

/home/user>time ./test 1000
dt=34749236 for n=1000 cycles (s=0)
dt=159102 for n=1000 cycles (s=0)
0.02s real 0.00s user 0.00s system

/home/user>time ./test 1000
dt=340074 for n=1000 cycles (s=0)
dt=15753 for n=1000 cycles (s=0)
0.00s real 0.00s user 0.00s system

/home/user>time ./test 1000
dt=257764 for n=1000 cycles (s=0)
dt=15087 for n=1000 cycles (s=0)
0.00s real 0.01s user 0.00s system

/home/user>pidin info
CPU:X86 Release:6.3.2 FreeMem:1469Mb/2047Mb BootTime:Jul 17 13:11:06
Processor1: 686 Intel 686 F6M13S8 1992MHz FPU


We have noted these differences only between nodes with file system and
nodes who use a remote file system throught qnet!

Could you explain what happens?

Thanks!
Davide


– test.c ------------------------------------------------------

#include <stdio.h
#include <sys/neutrino.h
#include <inttypes.h

static int Tab[1<<20] = { 10,11,12,3 };

int main(int argc, char** argv)
{
uint64_t t1,t2;
unsigned int s,n,i,j;

if (argc!=2) {
printf(“missing number of cycles parameter\n”);
return 1;
}

/* number of search in Tab */
n=atoi(argv[1]);

/* First loop /
/
pseudo random search in Tab - 1437041 is a prime number */
s=j=0;
t1 = ClockCycles();
for (i=0;i<n;i++)
s+=Tab[(j+=1437041)&((1<<20)-1)];
t2 = ClockCycles();

printf(“dt=%llu for n=%d cycles (s=%x)\n”,t2-t1,n,s);

/* Second loop /
/
pseudo random search in Tab - 1437041 is a prime number */
s=j=0;
t1 = ClockCycles();
for (i=0;i<n;i++)
s+=Tab[(j+=1437041)&((1<<20)-1)];
t2 = ClockCycles();

printf(“dt=%llu for n=%d cycles (s=%x)\n”,t2-t1,n,s);

return 0;
}


cburgess@qnx.com