Question on portling LINUX apps to QNX 6.1

edrishekl · June 20, 2004, 3:39pm

Until very recently my employer has been focused on GENTOO LINUX because of some interesting 2.4.x kernel patches that have gotten us some important sleep-to-wake response times in calls to select().
In particular, the Ingo patch allows one to change the kernel timer frequency which is yielding 700 microsecond sleep-to-wake latencies. The stock REDHAT 7.2 kernel, and the stock SOLARIS kernel, both take 10 milliseconds to wake up, so the Ingo patch in the LINUX 2.4.x kernel is the best performer I’ve found so far.

Enter QNX. I heard that QNX can yield even smaller latencies for the select() call, but by default I’m seeing roughly 2 millisecond latencies. Here’s the program I used to test QNX, LINUX, SOLARIS, SUNOS, IRIX and AIX for the same sleep-to-wake issue:

////////////////////////////////////////////////////////////////////////////////////////////////
#include <sys/time.h>
#include <sys/timeb.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>
#include <sys/times.h>
#include <signal.h>
unsigned long microseconds(void)
{
struct timeval tv;
struct timezone tz;
if( gettimeofday( &tv,&tz ) == -1 ) return( 0 );
return( (tv.tv_sec10001000) + tv.tv_usec );
}
int main(int argc,char *argv[])
{
unsigned long Start,Stop;
int count,max=100;
const struct timeval orig={0,1};
struct timeval tv;
Start = microseconds();
for(count=0; count<max; ++count)
{
memcpy(&tv,&orig,sizeof(tv));
select(0,NULL,NULL,NULL,&tv);
}
Stop = microseconds();
printf("[%d] iterations have runtime of:\n",max);
printf(" microseconds : %lu\n",Stop-Start);
printf(" milliseconds : %lu\n",(Stop-Start)/1000);
}
////////////////////////////////////////////////////////////////////////////////////////////////

SOLARIS and LINUX both have settings to change the kernel timer frequency so that I can improve the sleep-to-wake latency, but with SOLARIS the best I can get is 2 milliseconds, and as I mentioned above, 700 microseconds is the best I can get from LINUX.

My question is this: Can QNX be tuned so that it wakes up faster from a blocking select() call?

Many thanks!

-e

rick · June 20, 2004, 4:23pm

Obviously it depends on the speed of your machine. I tried the code posted here on a 1.6Ghz P4 and got lousy times of about 200 ms. I then added a call to ClockPeriod() at the start and changed the realtime clock down to 10us (the default clock is 1ms). This improved my time to 1.8 ms.

So you problem appears to be a timing resolution problem. My code is as follows:

#include <sys/time.h>
#include <sys/timeb.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <stdio.h>
#include <sys/times.h>
#include <signal.h>
#include <sys/neutrino.h>

unsigned long microseconds(void)
{
  struct timeval tv;
  struct timezone tz;

  if( gettimeofday( &tv,&tz ) == -1 ) return( 0 );
  return( (tv.tv_sec*1000*1000) + tv.tv_usec );
}
int main(int argc,char *argv[])
{
  unsigned long Start,Stop;
  int count,max=100;
  const struct timeval orig={0,1};
  struct timeval tv;
  struct _clockperiod cp;

  cp.nsec = 10000; cp.fract = 0;
  if( ClockPeriod(CLOCK_REALTIME, &cp, NULL, 0) == -1)
    {
       printf("ClockPeriod error: %s\n", strerror(errno));
    }
  Start = microseconds();
  for(count=0; count<max; ++count)
    {
      memcpy(&tv,&orig,sizeof(tv));
      select(0,NULL,NULL,NULL,&tv);
    }
  Stop = microseconds();

  printf("[%d] iterations have runtime of:\n",max);
  printf(" microseconds : %lu\n",Stop-Start);
  printf(" milliseconds : %lu\n",(Stop-Start)/1000);
}

Hope this helps,
Rick…

cdm · June 21, 2004, 3:22pm

Unlike Solaris and Linux, select() is not a system call on QNX. Try called SchedYeild() if you want to see yeild times to the kernel on QNX. This makes the comparision closer to apples-to-apples.

edrishekl · June 21, 2004, 5:03pm

I tried the changes using the ClockPeriod() call, but on the same hardware I found that LINUX was significantly beating QNX. Big surprise.

I saw CDM’s comment about SchedYeild(). The problem is this: I DO NOT WANT to rewrite this system, and right now each software component depends on a POSIX select() to get into the finite-state-engine that makes up the component architecture. I see lots of interesting QNX function calls, but I simply will not be allowed to completely rewrite the system just to find out if it will perform better than it does on LINUX.

Right now, even though ClockPeriod() will let me set the minimum to 500 microseconds, I never get better than 996 microsecond response when I run the test on a dual ATHLON MP 2800 box. Ditto results on a P4-3ghz. I can get 660 microseconds in gentoo LINUX with the Ingo kernel patch.

I’ve been running these tests on QNX 6.1. I’m currently downloading 6.3 to see if there is any improvement, but I suspect there won’t be.

-e

mario · June 21, 2004, 6:30pm

What you are mesuring is not the latency of select, but rather the precision + latency of the timeout value + timer resolution. To really compare latency use a server/client then check how long it takes for an event to get from one to the other, which is what I think you really want to benchmark, this is going to be faster then 700us !!! . Also use ClockCycles to measure timing. It will give you a value independant of the ticksize

To comply with POSIX you will always get at 2 timer cycles . You are asking for 1us but the default tick is 1ms, hence you got a 2 ms delay. That’s how QNX is designed. If you lower the ticksize like rick suggested you should be able to go lower then 700us. However you will pay a price for that (true under QNX and Linux) the higher the precision the more CPU cycles are required to deal with the timers code.

I think cmd assumed you were trying to measure kernel latency, which is why he gave you the tips on ScheldYeild()

rick · June 21, 2004, 7:50pm

Actually select() and the underlying code was completely rewritten for 6.3, so you should see something different.

As cdm and mario suggested, QNX implements system calls quite different the mono-kernel unixes. What are you actually trying to test? Context switch times? Kernel call times? Perhaps we can suggest a better test.

Rick…

edrishekl · June 21, 2004, 9:51pm

Here’s what I am doing:

I have a system (in LINUX) that uses sockets to connect to the stock market, perform analysis, execute trades. The first version of the system was multithreaded, each socket getting it’s own thread, using the Command pattern for message passing between threads, mutex/semaphores to control concurrency issues. Worked fine, except that we had a worse-case delay in multiples of 10 milliseconds as one thread passed information on to another - up to a cumulative total of 50 msec, worst case. Then I learned about the Ingo kernel scheduler patch for the 2.4.x kernel in LINUX, applied it under the GENTOO distribution, and our worst case time for thread wakeups was down to about 800 usec That was great. But in the worst case situation where multiple threads going from sleep-to-wake, we could still tally up to 5 msec.

So: I rewrote the system, was able to compress what used to be about 7 separate, multithreaded components down into 3 separate programs, each running in a single thread modeled around the select() function for handling the many sockets. That got our worst-case times down to 800 usec per component (3 programs = 3 pid’s = 2400 usec, worst case). BIG improvement.

The existing FSE-based system is good at what it does; certainly its the fastest version we’ve had. But right now we need to shave our worst-case time from 2400 usec down below 800 usec because of new challenges the system is going through, bigger demands. My job is to find a way to make it happen before the end of June or else me an everyone else can pretty much hit the road to find another job, or work for free since we haven’t had any revenue since April and the company is broke.

The last rewrite took almost an entire year, so you can imagine that rewriting the system is not an option.

I was hoping to find an OS that supported select() in a much faster way. select() uses multiple file descriptors and when one of them has data, the select wakes up and processing can begin. If there was a rough equivalent to this in QNX - something that I could stuff multiple file descriptors into and select() which among them had newly arrived socket data, I’d be in luck.

That’s my problem, in a nutshell.

-e

mezek · June 22, 2004, 8:58am

I tried the code rick posted on a P3 850MHz w/ qnx6.2.1 and 10000ns ClockPeriod. I get 20 microsecond reaction on select(), as you call it “sleep-to-wake”

cdm · June 22, 2004, 3:49pm

edrishekl - select() on 6.3 is built on poll() and has been improved speed wise (not that you are actually testing that in your test case) vs pre-6.3. Now, all of these things are built on ionotify(). You can actually build a custom version of select() using ionotify() such that you don’t have to re-arm any FDs besides the one you have just processed. Which will be much faster. You can do this on pre-6.3 as well if you don’t want to download it.

edrishekl · June 25, 2004, 2:39am

I got 6.3 installed and ran the test. I was delighted to see 18 usec. I’m working on porting the entire system now. Having a devil of a time finding details on how mutexes are implemented in QNX so I will probably go with the POSIX interface.

cdm · June 25, 2004, 6:27pm

We use pthread mutexes, so you should just use the pthread_mutex*() API.

And I suspect that you where running an old version of QNX that didn’t allow the clock to be lowered.