QNX RTOS 6.1 process freeze

Hi,.

I’m having trouble porting our database engine to QNX,
It’s a threaded app (POSIX threads), and runs fine on
solaris 8.

However on QNX I need to use libhoard as the default
malloc sigsev’es terribly. However even with libhoard
there is still a greater problem, the app freezes after
about 10 min. of execution. And after this happens each
process that tries to access the process table eg.:
ps, slay or shutdown; freezes in the same fashion.

Does anybody have an idea what could make QNX behave in
such a way ? Do I leak certain system resources, needed
to operate the process list ?

If I’ve missed something I’ll RTFM, just point me to it :wink:

thnx,

Peter Zijlstra

“Peter Zijlstra” <peter@xlnt.-software.net> wrote in message
news:3BF26289.3070503@xlnt.-software.net…

Hi,.

I’m having trouble porting our database engine to QNX,
It’s a threaded app (POSIX threads), and runs fine on
solaris 8.

However on QNX I need to use libhoard as the default
malloc sigsev’es terribly.

It’s a blind and long shot, but I ran into similar problem porting Postgres.
It was assuming that alignment of memory returned by malloc() matches
alignment requirement of largest standard C datatype (long long, which is
:sunglasses:. Unfortunately, QNX malloc() aligns to sizeof(int). Of course some
addresses also happened to be aligned to 8 as well, so it worked partially,
but failed in wickedly weird ways at random places.

However even with libhoard
there is still a greater problem, the app freezes after
about 10 min. of execution. And after this happens each
process that tries to access the process table eg.:
ps, slay or shutdown; freezes in the same fashion.

Does anybody have an idea what could make QNX behave in
such a way ? Do I leak certain system resources, needed
to operate the process list ?

I doubt that leakage can cause such effects. But I’ve seen weirder things,
like certain system calls failing in unexplainable ways after after being
invoked certain number of times (on both systems). Unless you tell more
precisely what do you mean by ‘freezes’ (pidin output would help) it is hard
to tell anything.

In general however, you should keep in mind some things when porting between
Solaris & QNX. Default behavior on Solaris is not POSIX. So unless you’ve
supplied certain -D flags to compiler on Solaris, code is not gonna work the
same way. For example, fork() on Solaris will try to duplicate all threads,
but in POSIX it will only duplicate the calling thread. Signals on Solaris
will be delivered to threads which installed timers, in POSIX they will be
delivered to process. Behavior of number of related functions is different.
Also some of Solaris library calls taking addresses of structures as
parameters will not copy content but rather just record the reference. That
means content can be modified afterwards, but not in QNX (since it is
message passing OS, it can’t just pass address to another process). That
applies to the sigevent structure, for example.

  • igor

Igor Kovalenko <kovalenko@home.com> wrote:

“Peter Zijlstra” <peter@xlnt.-software.net> wrote in message
news:3BF26289.3070503@xlnt.-software.net…
Hi,.

I’m having trouble porting our database engine to QNX,
It’s a threaded app (POSIX threads), and runs fine on
solaris 8.

However on QNX I need to use libhoard as the default
malloc sigsev’es terribly.

It’s a blind and long shot, but I ran into similar problem porting Postgres.
It was assuming that alignment of memory returned by malloc() matches
alignment requirement of largest standard C datatype (long long, which is
:sunglasses:> . Unfortunately, QNX malloc() aligns to sizeof(int). Of course some
addresses also happened to be aligned to 8 as well, so it worked partially,
but failed in wickedly weird ways at random places.

Coincidentally I made a PR about this just the other day!

However even with libhoard
there is still a greater problem, the app freezes after
about 10 min. of execution. And after this happens each
process that tries to access the process table eg.:
ps, slay or shutdown; freezes in the same fashion.

Does anybody have an idea what could make QNX behave in
such a way ? Do I leak certain system resources, needed
to operate the process list ?

I doubt that leakage can cause such effects. But I’ve seen weirder things,
like certain system calls failing in unexplainable ways after after being
invoked certain number of times (on both systems). Unless you tell more
precisely what do you mean by ‘freezes’ (pidin output would help) it is hard
to tell anything.

In general however, you should keep in mind some things when porting between
Solaris & QNX. Default behavior on Solaris is not POSIX. So unless you’ve
supplied certain -D flags to compiler on Solaris, code is not gonna work the
same way. For example, fork() on Solaris will try to duplicate all threads,
but in POSIX it will only duplicate the calling thread. Signals on Solaris
will be delivered to threads which installed timers, in POSIX they will be
delivered to process. Behavior of number of related functions is different.
Also some of Solaris library calls taking addresses of structures as
parameters will not copy content but rather just record the reference. That
means content can be modified afterwards, but not in QNX (since it is
message passing OS, it can’t just pass address to another process). That
applies to the sigevent structure, for example.

  • igor


cburgess@qnx.com

Colin Burgess wrote:

It’s a blind and long shot, but I ran into similar problem porting Postgres.
It was assuming that alignment of memory returned by malloc() matches
alignment requirement of largest standard C datatype (long long, which is
:sunglasses:> . Unfortunately, QNX malloc() aligns to sizeof(int). Of course some
addresses also happened to be aligned to 8 as well, so it worked partially,
but failed in wickedly weird ways at random places.

Coincidentally I made a PR about this just the other day!

LOL, coincidentally I already tried to get thomasf to fix this. With
magic powers of you two I hope it will be fixed for good now. It can
probably lead to SIGBUS errors on MIPS-64 where long long is native.

  • igor

Igor Kovalenko wrote:

I’m having trouble porting our database engine to QNX,
It’s a threaded app (POSIX threads), and runs fine on
solaris 8.

and freebsd, however that isn’t all to POSIX either :wink:

linux doesn’t work,. very strange crashes. but then again linuxthreads
are a bit weird. And somehow my gdb won’t do threading. Still gotta kick
that thing into working.

However on QNX I need to use libhoard as the default
malloc sigsev’es terribly.

It’s a blind and long shot, but I ran into similar problem porting Postgres.
It was assuming that alignment of memory returned by malloc() matches
alignment requirement of largest standard C datatype (long long, which is
:sunglasses:> . Unfortunately, QNX malloc() aligns to sizeof(int). Of course some
addresses also happened to be aligned to 8 as well, so it worked partially,
but failed in wickedly weird ways at random places.

hmm,. I’ll have a look into this,. u never know.

However even with libhoard
there is still a greater problem, the app freezes after
about 10 min. of execution. And after this happens each
process that tries to access the process table eg.:
ps, slay or shutdown; freezes in the same fashion.

Does anybody have an idea what could make QNX behave in
such a way ? Do I leak certain system resources, needed
to operate the process list ?


I doubt that leakage can cause such effects. But I’ve seen weirder things,
like certain system calls failing in unexplainable ways after after being
invoked certain number of times (on both systems). Unless you tell more
precisely what do you mean by ‘freezes’ (pidin output would help) it is hard
to tell anything.

even this pidin; didn’t know it existed; does the same. As soon as it
tries to access the processtable entry for our process it stop
responding to everything, even sigabrt.

I tried several small programms to see what the effect of leaking
resources is. however none could produce this effect.


while (1) {
pthread_mutex_t ms[1024];
int i;

for ( i=0; i<1024; i++) pthread_mutex_init( &ms_, NULL);
}
– This doesn’t seem to do anything, even though the documents say that
QNX allocates system resources for its mutices.
\

while (1) {
pthread_t tid;
pthread_create( &tid, NULL, my_thread, NULL);
}
– where void *my_thread( void *) { return NULL; }, leaks unrecoverable
system memory._


In general however, you should keep in mind some things when porting between
Solaris & QNX. Default behavior on Solaris is not POSIX. So unless you’ve
supplied certain -D flags to compiler on Solaris, code is not gonna work the
same way. For example, fork() on Solaris will try to duplicate all threads,
but in POSIX it will only duplicate the calling thread. Signals on Solaris
will be delivered to threads which installed timers, in POSIX they will be
delivered to process. Behavior of number of related functions is different.
Also some of Solaris library calls taking addresses of structures as
parameters will not copy content but rather just record the reference. That
means content can be modified afterwards, but not in QNX (since it is
message passing OS, it can’t just pass address to another process). That
applies to the sigevent structure, for example.

-D_POSIX_C_SOURCE=199506L -D__EXTENSIONS_ -D_REENTRANT
And I’ve read the threading document for solaris 8 from docs.sun.com,

it seems none of the issues mentioned there seem to be it.


We don’t use fork, except to detach from the controlling terminal,

no signals, except to terminate.



Regards,

Peter Zijlstra_

Another long shot…

Grap Igor’s ‘spin’ program that gives you utilization numbers…

You may have a kernel call failing in a strange way that is being
called repeatedly at a high priority that is starving the rest of
the system…

If you have any calls to set priorities I usually disable those during
porting to be safe.

Jay

Peter Zijlstra wrote in message <3BF527CA.7060000@xlnt-software.com>…

Igor Kovalenko wrote:

I’m having trouble porting our database engine to QNX,
It’s a threaded app (POSIX threads), and runs fine on
solaris 8.

and freebsd, however that isn’t all to POSIX either > :wink:

linux doesn’t work,. very strange crashes. but then again linuxthreads
are a bit weird. And somehow my gdb won’t do threading. Still gotta kick
that thing into working.

However on QNX I need to use libhoard as the default
malloc sigsev’es terribly.

It’s a blind and long shot, but I ran into similar problem porting
Postgres.
It was assuming that alignment of memory returned by malloc() matches
alignment requirement of largest standard C datatype (long long, which is
:sunglasses:> . Unfortunately, QNX malloc() aligns to sizeof(int). Of course some
addresses also happened to be aligned to 8 as well, so it worked
partially,
but failed in wickedly weird ways at random places.


hmm,. I’ll have a look into this,. u never know.

However even with libhoard
there is still a greater problem, the app freezes after
about 10 min. of execution. And after this happens each
process that tries to access the process table eg.:
ps, slay or shutdown; freezes in the same fashion.

Does anybody have an idea what could make QNX behave in
such a way ? Do I leak certain system resources, needed
to operate the process list ?


I doubt that leakage can cause such effects. But I’ve seen weirder
things,
like certain system calls failing in unexplainable ways after after being
invoked certain number of times (on both systems). Unless you tell more
precisely what do you mean by ‘freezes’ (pidin output would help) it is
hard
to tell anything.


even this pidin; didn’t know it existed; does the same. As soon as it
tries to access the processtable entry for our process it stop
responding to everything, even sigabrt.

I tried several small programms to see what the effect of leaking
resources is. however none could produce this effect.


while (1) {
pthread_mutex_t ms[1024];
int i;

for ( i=0; i<1024; i++) pthread_mutex_init( &ms> _, NULL);
}
– This doesn’t seem to do anything, even though the documents say that
QNX allocates system resources for its mutices.
\

while (1) {
pthread_t tid;
pthread_create( &tid, NULL, my_thread, NULL);
}
– where void *my_thread( void *) { return NULL; }, leaks unrecoverable
system memory.


In general however, you should keep in mind some things when porting
between
Solaris & QNX. Default behavior on Solaris is not POSIX. So unless you’ve
supplied certain -D flags to compiler on Solaris, code is not gonna work
the
same way. For example, fork() on Solaris will try to duplicate all
threads,
but in POSIX it will only duplicate the calling thread. Signals on
Solaris
will be delivered to threads which installed timers, in POSIX they will
be
delivered to process. Behavior of number of related functions is
different.
Also some of Solaris library calls taking addresses of structures as
parameters will not copy content but rather just record the reference.
That
means content can be modified afterwards, but not in QNX (since it is
message passing OS, it can’t just pass address to another process). That
applies to the sigevent structure, for example.


-D_POSIX_C_SOURCE=199506L -D__EXTENSIONS__ -D_REENTRANT
And I’ve read the threading document for solaris 8 from docs.sun.com,

it seems none of the issues mentioned there seem to be it.


We don’t use fork, except to detach from the controlling terminal,

no signals, except to terminate.



Regards,

Peter Zijlstra_

That might give some hints indeed. Make sure you run spin at highest
priority (‘on -p63 spin’).

  • igor

“Jay Hogg” <Jay.Hogg@t-netix.com.r-e-m-o-v-e> wrote in message
news:9t6d8k$hkf$1@inn.qnx.com

Another long shot…

Grap Igor’s ‘spin’ program that gives you utilization numbers…

You may have a kernel call failing in a strange way that is being
called repeatedly at a high priority that is starving the rest of
the system…

If you have any calls to set priorities I usually disable those during
porting to be safe.

Jay

Peter Zijlstra wrote in message <> 3BF527CA.7060000@xlnt-software.com> >…
Igor Kovalenko wrote:

I’m having trouble porting our database engine to QNX,
It’s a threaded app (POSIX threads), and runs fine on
solaris 8.

and freebsd, however that isn’t all to POSIX either > :wink:

linux doesn’t work,. very strange crashes. but then again linuxthreads
are a bit weird. And somehow my gdb won’t do threading. Still gotta kick
that thing into working.

However on QNX I need to use libhoard as the default
malloc sigsev’es terribly.

It’s a blind and long shot, but I ran into similar problem porting
Postgres.
It was assuming that alignment of memory returned by malloc() matches
alignment requirement of largest standard C datatype (long long, which
is
:sunglasses:> . Unfortunately, QNX malloc() aligns to sizeof(int). Of course some
addresses also happened to be aligned to 8 as well, so it worked
partially,
but failed in wickedly weird ways at random places.


hmm,. I’ll have a look into this,. u never know.

However even with libhoard
there is still a greater problem, the app freezes after
about 10 min. of execution. And after this happens each
process that tries to access the process table eg.:
ps, slay or shutdown; freezes in the same fashion.

Does anybody have an idea what could make QNX behave in
such a way ? Do I leak certain system resources, needed
to operate the process list ?


I doubt that leakage can cause such effects. But I’ve seen weirder
things,
like certain system calls failing in unexplainable ways after after
being
invoked certain number of times (on both systems). Unless you tell more
precisely what do you mean by ‘freezes’ (pidin output would help) it is
hard
to tell anything.


even this pidin; didn’t know it existed; does the same. As soon as it
tries to access the processtable entry for our process it stop
responding to everything, even sigabrt.

I tried several small programms to see what the effect of leaking
resources is. however none could produce this effect.


while (1) {
pthread_mutex_t ms[1024];
int i;

for ( i=0; i<1024; i++) pthread_mutex_init( &ms> _, NULL);
}
– This doesn’t seem to do anything, even though the documents say that
QNX allocates system resources for its mutices.
\

while (1) {
pthread_t tid;
pthread_create( &tid, NULL, my_thread, NULL);
}
– where void *my_thread( void *) { return NULL; }, leaks unrecoverable
system memory.


In general however, you should keep in mind some things when porting
between
Solaris & QNX. Default behavior on Solaris is not POSIX. So unless
you’ve
supplied certain -D flags to compiler on Solaris, code is not gonna
work
the
same way. For example, fork() on Solaris will try to duplicate all
threads,
but in POSIX it will only duplicate the calling thread. Signals on
Solaris
will be delivered to threads which installed timers, in POSIX they will
be
delivered to process. Behavior of number of related functions is
different.
Also some of Solaris library calls taking addresses of structures as
parameters will not copy content but rather just record the reference.
That
means content can be modified afterwards, but not in QNX (since it is
message passing OS, it can’t just pass address to another process).
That
applies to the sigevent structure, for example.


-D_POSIX_C_SOURCE=199506L -D__EXTENSIONS__ -D_REENTRANT
And I’ve read the threading document for solaris 8 from docs.sun.com,

it seems none of the issues mentioned there seem to be it.


We don’t use fork, except to detach from the controlling terminal,

no signals, except to terminate.



Regards,

Peter Zijlstra_