Neutrino TCPIP Performance

I’ve been trying to evaluate the performance of RTP and the results I’m
seeing have me confused. One area I’m particularly interested in is it’s
networking performance. A simple test I’m using is ttcp to the local
loopback interface. I ran the identical test on the same physical hardware
with three OSs:

ttcp -r -s &
ttcp -t -s -n100000 localhost

NetBSD/i386 1.4.2
→ 60158 KB/sec

QNX 4.25
→ 12953 KB/sec

Neutrino RTP (with the 4.4 stack, not ttcpip)
→ 8004 KB/sec

These were not run on a particularly fast machine (450Mhz), so the relative
numbers are more interesting than magnitude.

On the same hardware, NetBSD ran this test ~7.5 times faster than Neutrino.

I’m looking for some input to sanity check my expectations:

  • Is it reasonable for RTP to be this much slower at networking TCPIP than a
    BSD?
  • Is this an artifact of the beta state of this code?
  • Why is 4.25 faster? (point above perhaps?)
  • Is there a known issue that I’m not privy too? (QSSL?)
  • Is my test brain-dead?

An even simpler test:

for (i=0; i < 100000; i++) {
s = socket(AF_INET, SOCK_STREAM, 0);
close(s);
}

This ran MUCH faster on NetBSD (I don’t want to quote numbers because these
were not run on the exact same machine).

My goal here is understand these issues, not cut-up neutrino. Neutrino looks
great as an RTOS (I’ve worked with many) on the feature front. However, for
me performance is the bottom line.

  • bill

Bill Roberson <bill.roberson@mindspring.com> wrote:
: I’ve been trying to evaluate the performance of RTP and the results I’m
: seeing have me confused. One area I’m particularly interested in is it’s
: networking performance. A simple test I’m using is ttcp to the local
: loopback interface. I ran the identical test on the same physical hardware
: with three OSs:

: ttcp -r -s &
: ttcp -t -s -n100000 localhost

: NetBSD/i386 1.4.2
: → 60158 KB/sec

: QNX 4.25
: → 12953 KB/sec

: Neutrino RTP (with the 4.4 stack, not ttcpip)
: → 8004 KB/sec

: These were not run on a particularly fast machine (450Mhz), so the relative
: numbers are more interesting than magnitude.

: On the same hardware, NetBSD ran this test ~7.5 times faster than Neutrino.

: I’m looking for some input to sanity check my expectations:
: - Is it reasonable for RTP to be this much slower at networking TCPIP than a
: BSD?

To be acurate here, you’re testing loopback wich isn’t, strictly speaking,
networking. There are a lot of shortcuts which can be made over loopback.
A common one is to convert an AF_INET socket to AF_UNIX (AF_LOCAL), or some
hybrid thereof. This can mean a shortcut path through the stack and
no cksum. We don’t currently do this. I’m not saying definitively that
NetBSD is doing this, but at 7.5 times, I’d be surprised if you’re comparing
apples to apples.

Maybe try a AF_UNIX socket outright on NetBSD and see if there’s any
improvement over an AF_INET socket on that platform.

: - Is this an artifact of the beta state of this code?
: - Why is 4.25 faster? (point above perhaps?)

The stack and io-net are getting faster. I think we should
eventually be able to come close to / surpass qnx4.

: - Is there a known issue that I’m not privy too? (QSSL?)
: - Is my test brain-dead?

: An even simpler test:

: for (i=0; i < 100000; i++) {
: s = socket(AF_INET, SOCK_STREAM, 0);
: close(s);
: }

I’ve seen this test recently :slight_smile:. Here’s basically what I said originally:

socket(AF_INET, …) is basically open("/dev/socket/2", …) (AF_INET == 2).

I can speed this test case up by close to a factor of 2 without a new stack.
The latest stack will help as well but the big gain is openfd() which bypasses
all the pathname management stuff in open(). Try something like the following
on pathname entries managed by different managers to get a feel…

#include <sys/socket.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int
main(int argc, char **argv)
{

int i, fd_seed, fd_tmp;
int opt;


if(argc == 1)
{
fprintf(stderr, “need an int\n”);
return 1;
}

opt = strtol(argv[1], NULL, 10);

switch(opt)
{
case 0:
printf(“norm socket\n”);
for ( i = 0; i < 1000000; ++i)
{
fd_tmp = socket( AF_INET, SOCK_DGRAM, 0 );
close( fd_tmp );
}
break;
case 1:
printf(“fast socket\n”);
fd_seed = socket( AF_INET, SOCK_DGRAM, 0 );
for ( i = 0; i < 1000000; ++i)
{
fd_tmp = openfd(fd_seed, O_CLOEXEC); //O_CLOEXEC saves a kernel call
close( fd_tmp );
}
break;
case 2:
printf("/dev/null\n"); //probably the best case for normal open()
for ( i = 0; i < 1000000; ++i)
{
fd_tmp = open("/dev/null", O_RDWR);
close( fd_tmp );
}
break;
default:
fprintf(stderr, “Huh?\n”);
return 1;
break;
}

return 0;
}

-seanb


: This ran MUCH faster on NetBSD (I don’t want to quote numbers because these
: were not run on the exact same machine).

: My goal here is understand these issues, not cut-up neutrino. Neutrino looks
: great as an RTOS (I’ve worked with many) on the feature front. However, for
: me performance is the bottom line.

: - bill