QNX4.25 SIGCHLD Handler

Hi there

Is SIGCHLD signal handler to be armed after the signal is delevered to the
application?
Is it possible that the system area that is used by signal call is over
written by user code?

Here is a brief description of the code:
a) Signal Handler
Signal handler is associated with SIGCHLD signal using a signal system
call.
The event that trigers the signal is the death of child processes that were
previously launched using qnx_spawn system call.

SigChildHandler()
{
gSigFlg = 1;
}

b) Uniqe Read function
Read(…)
{
if (gSigFlg)
processCleanup()
while ((pid= receive(0…))==-1)
{
processCleanup()
if (gSigFlg)
processCleanup()
}

}
processCleanup()
{
while ((pid = waitpid(0,NULL,WNOHANG) == -1)
put a application’s process termination in the application’s msgque.
gSigFlg = 0
}

This code is organized in a library which is linked to application.
In a thin application this works fine.
In a heavy application the s flag in the qnx process control word is cleared
after the second child death - on returning from the signal handler.
I watch this flag using sin -P appname flags .

Thanks
Rami

Rami Raviv <raviv_r@netvision.net.il> wrote:

Hi there

Is SIGCHLD signal handler to be armed after the signal is delevered to the
application?

No, it should stay armed.

Is it possible that the system area that is used by signal call is over
written by user code?

Possible, but not very likely.


Here is a brief description of the code:
a) Signal Handler
Signal handler is associated with SIGCHLD signal using a signal system
call.
The event that trigers the signal is the death of child processes that were
previously launched using qnx_spawn system call.



SigChildHandler()
{
gSigFlg = 1;
}

b) Uniqe Read function
Read(…)
{
if (gSigFlg)
processCleanup()
while ((pid= receive(0…))==-1)
{
processCleanup()
if (gSigFlg)
processCleanup()
}

}
processCleanup()
{
while ((pid = waitpid(0,NULL,WNOHANG) == -1)
put a application’s process termination in the application’s msgque.
gSigFlg = 0
}

Note you have a race condition in the above code. The setting of
gSigFlg to 0 should be BEFORE you do your waitpid() loop. What would
happen if you did the waitpid on the last death, then another death
happened, you do your signal handler, then you set gSigFlg to 0? Oops.

processCleanup() should look like:

processCleanup()
{
gSigFlg = 0
while ((pid = waitpid(0,NULL,WNOHANG) == -1)
put a application’s process termination in the application’s msgque.
}

This isn’t causing the problem you’re seeing, though.

This code is organized in a library which is linked to application.
In a thin application this works fine.

Ok. Makes sense – my test application didn’t see a problem either.

In a heavy application the s flag in the qnx process control word is cleared
after the second child death - on returning from the signal handler.

Hm… this suggests that something in your process somewhere else is
proably making a call to modify the handler for SIGCHLD. Whether or
not you have a signal handler installed (or the default behaviour, or
SIG_IGN) is stored in the process’ memory space, not in the process table
entry in Proc, so an element of your process COULD overwrite – but it
is not likely to do so, as it is at an address range near most of your
other data. More likely is that there is a call to signal() or sigaction()
somewhere else that is modifying this – either for the wrong signal, or
specifically for SIGCHLD, but for an incorrect reason.

QNX will not clear this flag just because the handler has been called,
so you don’t (for OS purposes) need to re-attach the handler or anything
like that. (This was a bug in some older Unix versions, it does not
exist in QNX.)

-David
QNX Training Services
dagibbs@qnx.com

My test sample, just for reference:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <process.h>
#include <sys/types.h>
#include <sys/wait.h>

void handler()
{
int ret;
ret = waitpid( 0, NULL, WNOHANG );
printf(“waitpid returned %d\n”, ret );
}

void main()
{
int ret;
printf(“before sigchld call\n”);
sleep(5);
signal( SIGCHLD, handler );

printf(“after sigchld call\n”);

while(1)
{
ret = spawnl(P_NOWAIT, “/bin/sleep”, “sleep”, “2”, NULL );
printf(“child is %d\n”, ret );
sleep(10);
}
}

The reason of clearing NO_CLDSTOP process flag was
found. It is a call to system function else were in the application in order
to activate a script.
Is it right or a bug ?

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:9pv7iu$6qf$1@nntp.qnx.com

Rami Raviv <> raviv_r@netvision.net.il> > wrote:
Hi there

Is SIGCHLD signal handler to be armed after the signal is delevered to
the
application?

No, it should stay armed.

Is it possible that the system area that is used by signal call is over
written by user code?

Possible, but not very likely.


Here is a brief description of the code:
a) Signal Handler
Signal handler is associated with SIGCHLD signal using a signal system
call.
The event that trigers the signal is the death of child processes that
were
previously launched using qnx_spawn system call.



SigChildHandler()
{
gSigFlg = 1;
}

b) Uniqe Read function
Read(…)
{
if (gSigFlg)
processCleanup()
while ((pid= receive(0…))==-1)
{
processCleanup()
if (gSigFlg)
processCleanup()
}

}
processCleanup()
{
while ((pid = waitpid(0,NULL,WNOHANG) == -1)
put a application’s process termination in the application’s
msgque.
gSigFlg = 0
}

Note you have a race condition in the above code. The setting of
gSigFlg to 0 should be BEFORE you do your waitpid() loop. What would
happen if you did the waitpid on the last death, then another death
happened, you do your signal handler, then you set gSigFlg to 0? Oops.

processCleanup() should look like:

processCleanup()
{
gSigFlg = 0
while ((pid = waitpid(0,NULL,WNOHANG) == -1)
put a application’s process termination in the application’s
msgque.
}

This isn’t causing the problem you’re seeing, though.

This code is organized in a library which is linked to application.
In a thin application this works fine.

Ok. Makes sense – my test application didn’t see a problem either.

In a heavy application the s flag in the qnx process control word is
cleared
after the second child death - on returning from the signal handler.

Hm… this suggests that something in your process somewhere else is
proably making a call to modify the handler for SIGCHLD. Whether or
not you have a signal handler installed (or the default behaviour, or
SIG_IGN) is stored in the process’ memory space, not in the process table
entry in Proc, so an element of your process COULD overwrite – but it
is not likely to do so, as it is at an address range near most of your
other data. More likely is that there is a call to signal() or
sigaction()
somewhere else that is modifying this – either for the wrong signal, or
specifically for SIGCHLD, but for an incorrect reason.

QNX will not clear this flag just because the handler has been called,
so you don’t (for OS purposes) need to re-attach the handler or anything
like that. (This was a bug in some older Unix versions, it does not
exist in QNX.)

-David
QNX Training Services
dagibbs@qnx.com

My test sample, just for reference:

#include <stdio.h
#include <stdlib.h
#include <unistd.h
#include <signal.h
#include <process.h
#include <sys/types.h
#include <sys/wait.h

void handler()
{
int ret;
ret = waitpid( 0, NULL, WNOHANG );
printf(“waitpid returned %d\n”, ret );
}

void main()
{
int ret;
printf(“before sigchld call\n”);
sleep(5);
signal( SIGCHLD, handler );

printf(“after sigchld call\n”);

while(1)
{
ret = spawnl(P_NOWAIT, “/bin/sleep”, “sleep”, “2”, NULL );
printf(“child is %d\n”, ret );
sleep(10);
}
}

Rami Raviv <rami_r@elisra.com> wrote:

The reason of clearing NO_CLDSTOP process flag was
found. It is a call to system function else were in the application in order
to activate a script.
Is it right or a bug ?

That is a bug. Probably in the spawn*(P_WAIT,…) code that system()
calls to start the shell and wait for the result. I remember fixing
a bug with this code – but I think I caught the signal mask not being
properly saved/restored. Apparently the handler is also not being saved
and restored. (spawn needs to use SIGCHLD to wait for the child to die
before returning, which allows system() to wait for the child to die
before returning.)

I don’t expect this to be fixed in any reasonable amount of time – so
you’re best bet is to look at some kind of work-around.

Depending on when/where you use system(), the best choice differs.
Even if system() were working as advertised – you still have a
problem if you use it after you’ve started processes that you want to
catch with your cleanup code – if one of your processes died while
system() was waiting, your handler would NOT get called, but the one
setup by the library would.

If you’re just using system() during initialization, this is easy to
fix – just make your call to signal()/sigaction() after you’ve made
all calls to system().

On a side note, you should probably be aware that, unless you are actually
starting a shell for something, that system() is an inefficient way to run
another program – it will generally result in the creation of two processes
rather than just one. e.g. system(“sleep 5”) will run a shell, and the
only useful thing that shell will do is turn “sleep 5” into a call to
spawn*() with the appropriate arguments. Substituting spawn(P_WAIT) will
be more efficient than system(), but won’t solve your problem.

Also, another thing to think about… have you signal handler Trigger()
a proxy. It will cleanup all of those race conditions with the signal
being delivered when you’re not blocked on Receive(), so you never un-block
from the next Receive() to wait() on the child death.

-David

“David Gibbs” <> dagibbs@qnx.com> > wrote in message
news:9pv7iu$6qf$> 1@nntp.qnx.com> …
Rami Raviv <> raviv_r@netvision.net.il> > wrote:
Hi there

Is SIGCHLD signal handler to be armed after the signal is delevered to
the
application?

No, it should stay armed.

Is it possible that the system area that is used by signal call is over
written by user code?

Possible, but not very likely.


Here is a brief description of the code:
a) Signal Handler
Signal handler is associated with SIGCHLD signal using a signal system
call.
The event that trigers the signal is the death of child processes that
were
previously launched using qnx_spawn system call.



SigChildHandler()
{
gSigFlg = 1;
}

b) Uniqe Read function
Read(…)
{
if (gSigFlg)
processCleanup()
while ((pid= receive(0…))==-1)
{
processCleanup()
if (gSigFlg)
processCleanup()
}

}
processCleanup()
{
while ((pid = waitpid(0,NULL,WNOHANG) == -1)
put a application’s process termination in the application’s
msgque.
gSigFlg = 0
}

Note you have a race condition in the above code. The setting of
gSigFlg to 0 should be BEFORE you do your waitpid() loop. What would
happen if you did the waitpid on the last death, then another death
happened, you do your signal handler, then you set gSigFlg to 0? Oops.

processCleanup() should look like:

processCleanup()
{
gSigFlg = 0
while ((pid = waitpid(0,NULL,WNOHANG) == -1)
put a application’s process termination in the application’s
msgque.
}

This isn’t causing the problem you’re seeing, though.

This code is organized in a library which is linked to application.
In a thin application this works fine.

Ok. Makes sense – my test application didn’t see a problem either.

In a heavy application the s flag in the qnx process control word is
cleared
after the second child death - on returning from the signal handler.

Hm… this suggests that something in your process somewhere else is
proably making a call to modify the handler for SIGCHLD. Whether or
not you have a signal handler installed (or the default behaviour, or
SIG_IGN) is stored in the process’ memory space, not in the process table
entry in Proc, so an element of your process COULD overwrite – but it
is not likely to do so, as it is at an address range near most of your
other data. More likely is that there is a call to signal() or
sigaction()
somewhere else that is modifying this – either for the wrong signal, or
specifically for SIGCHLD, but for an incorrect reason.

QNX will not clear this flag just because the handler has been called,
so you don’t (for OS purposes) need to re-attach the handler or anything
like that. (This was a bug in some older Unix versions, it does not
exist in QNX.)

-David
QNX Training Services
dagibbs@qnx.com

My test sample, just for reference:

#include <stdio.h
#include <stdlib.h
#include <unistd.h
#include <signal.h
#include <process.h
#include <sys/types.h
#include <sys/wait.h

void handler()
{
int ret;
ret = waitpid( 0, NULL, WNOHANG );
printf(“waitpid returned %d\n”, ret );
}

void main()
{
int ret;
printf(“before sigchld call\n”);
sleep(5);
signal( SIGCHLD, handler );

printf(“after sigchld call\n”);

while(1)
{
ret = spawnl(P_NOWAIT, “/bin/sleep”, “sleep”, “2”, NULL );
printf(“child is %d\n”, ret );
sleep(10);
}
}


QNX Training Services
dagibbs@qnx.com

David Gibbs <dagibbs@qnx.com> wrote:

Rami Raviv <> rami_r@elisra.com> > wrote:
The reason of clearing NO_CLDSTOP process flag was
found. It is a call to system function else were in the application in order
to activate a script.
Is it right or a bug ?

That is a bug.

I’ve issued a PR (problem report) against this.

I don’t expect this to be fixed in any reasonable amount of time – so
you’re best bet is to look at some kind of work-around.

-David

QNX Training Services
dagibbs@qnx.com