SIGCHLD disappears

My situation: QNX 4.25C. I have some code that forks
and execs a helper program, then it goes off and
waits for more input while the helper program runs
in the background. (The input comes through a shared-
memory queue; the program spends most of its time
on a semwait() waiting for the semaphore to tell
it that there is something in the shared memory
that it needs to process.) When the helper program
terminates, the main program needs to issue a user
message giving the termination status of the helper.
Because it needs to do this asynchronously, I used
sigaction() to set up a signal handler to catch
SIGCHLD.

The SIGCHLD handler does a wait() to reap the
zombie child process, and then issues a message
based on the child’s exit status. Here’s the
problem: Sometimes the parent receives the SIGCHLD,
and sometimes it doesn’t. (And we aren’t talking
about a situation where a whole bunch of SIGCHLDs
would be issued at one time, potentially causing
some to be lost.) I haven’t yet figured out under
what circumstances the parent fails to receive
the SIGCHLD, although it seems that I can aggrevate
it by having the parent process some other input
(not consistently, though). When it gets in this
state, I look at the process signal state with sin.
SIGCHLD isn’t in the ignore mask, and there are
no pending signals. The zombie is sitting there
waiting to be reaped, and nothing’s happening.

For a test, I put in a second sigaction() call to
have it trap SIGUSR1 using the same signal handler.
When the parent gets in its bad state, with the
zombie sitting there waiting to be reaped, I try
using the kill command from the console to send the
parent the SIGCHLD signal. Nothing happens; the
signal just seems to disappear. Now I use kill to
send it SIGUSR1. That works! It goes to the
signal handler and reaps the zombie child process.

At this point I’m thinking that the most time
expedient way to solve this problem is to set up
the parent to just catch SIGUSR1. Then I’ll have
it pass its PID to the child as a command line
argument when it exec’s the helper program. Then
I’ll put an exit handler in the helper program
to have it send SIGUSR1 to its parent so the parent
knows to go wait() for it. But this isn’t ideal;
the parent will never know if the helper dies on,
say, a SIGSEGV, unless I have the helper catch
all signals – not really a good way to do it.
And it bugs me that the SIGCHLD just vanishes into
thin air when other signals still work. Anyone
got any idea what’s going on here?

There was post a few days ago that mentioned a “system()” function call will
not correctly re-install a SIGHUP handler. Could this be what you are
seeing?

“Dave Cornutt” <david.k.cornutt@boeing.com> wrote in message
news:4977cf6d.0110160651.5f2c75fe@posting.google.com…> My situation: QNX
4.25C. I have some code that forks

and execs a helper program, then it goes off and
waits for more input while the helper program runs
in the background. (The input comes through a shared-
memory queue; the program spends most of its time
on a semwait() waiting for the semaphore to tell
it that there is something in the shared memory
that it needs to process.) When the helper program
terminates, the main program needs to issue a user
message giving the termination status of the helper.
Because it needs to do this asynchronously, I used
sigaction() to set up a signal handler to catch
SIGCHLD.

The SIGCHLD handler does a wait() to reap the
zombie child process, and then issues a message
based on the child’s exit status. Here’s the
problem: Sometimes the parent receives the SIGCHLD,
and sometimes it doesn’t. (And we aren’t talking
about a situation where a whole bunch of SIGCHLDs
would be issued at one time, potentially causing
some to be lost.) I haven’t yet figured out under
what circumstances the parent fails to receive
the SIGCHLD, although it seems that I can aggrevate
it by having the parent process some other input
(not consistently, though). When it gets in this
state, I look at the process signal state with sin.
SIGCHLD isn’t in the ignore mask, and there are
no pending signals. The zombie is sitting there
waiting to be reaped, and nothing’s happening.

For a test, I put in a second sigaction() call to
have it trap SIGUSR1 using the same signal handler.
When the parent gets in its bad state, with the
zombie sitting there waiting to be reaped, I try
using the kill command from the console to send the
parent the SIGCHLD signal. Nothing happens; the
signal just seems to disappear. Now I use kill to
send it SIGUSR1. That works! It goes to the
signal handler and reaps the zombie child process.

At this point I’m thinking that the most time
expedient way to solve this problem is to set up
the parent to just catch SIGUSR1. Then I’ll have
it pass its PID to the child as a command line
argument when it exec’s the helper program. Then
I’ll put an exit handler in the helper program
to have it send SIGUSR1 to its parent so the parent
knows to go wait() for it. But this isn’t ideal;
the parent will never know if the helper dies on,
say, a SIGSEGV, unless I have the helper catch
all signals – not really a good way to do it.
And it bugs me that the SIGCHLD just vanishes into
thin air when other signals still work. Anyone
got any idea what’s going on here?

Whoops I meant SIGCHLD.

“Brown, Richard” <brownr@aecl.ca> wrote in message
news:9qhmer$bjn$1@inn.qnx.com

There was post a few days ago that mentioned a “system()” function call
will
not correctly re-install a SIGHUP handler. Could this be what you are
seeing?

“Dave Cornutt” <> david.k.cornutt@boeing.com> > wrote in message
news:> 4977cf6d.0110160651.5f2c75fe@posting.google.com> …> My situation: QNX
4.25C. I have some code that forks
and execs a helper program, then it goes off and
waits for more input while the helper program runs
in the background. (The input comes through a shared-
memory queue; the program spends most of its time
on a semwait() waiting for the semaphore to tell
it that there is something in the shared memory
that it needs to process.) When the helper program
terminates, the main program needs to issue a user
message giving the termination status of the helper.
Because it needs to do this asynchronously, I used
sigaction() to set up a signal handler to catch
SIGCHLD.

The SIGCHLD handler does a wait() to reap the
zombie child process, and then issues a message
based on the child’s exit status. Here’s the
problem: Sometimes the parent receives the SIGCHLD,
and sometimes it doesn’t. (And we aren’t talking
about a situation where a whole bunch of SIGCHLDs
would be issued at one time, potentially causing
some to be lost.) I haven’t yet figured out under
what circumstances the parent fails to receive
the SIGCHLD, although it seems that I can aggrevate
it by having the parent process some other input
(not consistently, though). When it gets in this
state, I look at the process signal state with sin.
SIGCHLD isn’t in the ignore mask, and there are
no pending signals. The zombie is sitting there
waiting to be reaped, and nothing’s happening.

For a test, I put in a second sigaction() call to
have it trap SIGUSR1 using the same signal handler.
When the parent gets in its bad state, with the
zombie sitting there waiting to be reaped, I try
using the kill command from the console to send the
parent the SIGCHLD signal. Nothing happens; the
signal just seems to disappear. Now I use kill to
send it SIGUSR1. That works! It goes to the
signal handler and reaps the zombie child process.

At this point I’m thinking that the most time
expedient way to solve this problem is to set up
the parent to just catch SIGUSR1. Then I’ll have
it pass its PID to the child as a command line
argument when it exec’s the helper program. Then
I’ll put an exit handler in the helper program
to have it send SIGUSR1 to its parent so the parent
knows to go wait() for it. But this isn’t ideal;
the parent will never know if the helper dies on,
say, a SIGSEGV, unless I have the helper catch
all signals – not really a good way to do it.
And it bugs me that the SIGCHLD just vanishes into
thin air when other signals still work. Anyone
got any idea what’s going on here?

“Dave Cornutt” <david.k.cornutt@boeing.com> wrote in message
news:4977cf6d.0110160651.5f2c75fe@posting.google.com

My situation: QNX 4.25C. I have some code that forks
and execs a helper program, then it goes off and
waits for more input while the helper program runs
in the background. (The input comes through a shared-
memory queue; the program spends most of its time
on a semwait() waiting for the semaphore to tell
it that there is something in the shared memory
that it needs to process.) When the helper program
terminates, the main program needs to issue a user
message giving the termination status of the helper.
Because it needs to do this asynchronously, I used
sigaction() to set up a signal handler to catch
SIGCHLD.

The SIGCHLD handler does a wait() to reap the
zombie child process, and then issues a message
based on the child’s exit status. Here’s the
problem: Sometimes the parent receives the SIGCHLD,
and sometimes it doesn’t. (And we aren’t talking
about a situation where a whole bunch of SIGCHLDs
would be issued at one time, potentially causing
some to be lost.) I haven’t yet figured out under
what circumstances the parent fails to receive
the SIGCHLD, although it seems that I can aggrevate
it by having the parent process some other input
(not consistently, though). When it gets in this
state, I look at the process signal state with sin.
SIGCHLD isn’t in the ignore mask, and there are
no pending signals. The zombie is sitting there
waiting to be reaped, and nothing’s happening.

For a test, I put in a second sigaction() call to
have it trap SIGUSR1 using the same signal handler.
When the parent gets in its bad state, with the
zombie sitting there waiting to be reaped, I try
using the kill command from the console to send the
parent the SIGCHLD signal. Nothing happens; the
signal just seems to disappear. Now I use kill to
send it SIGUSR1. That works! It goes to the
signal handler and reaps the zombie child process.

At this point I’m thinking that the most time
expedient way to solve this problem is to set up
the parent to just catch SIGUSR1. Then I’ll have
it pass its PID to the child as a command line
argument when it exec’s the helper program. Then
I’ll put an exit handler in the helper program
to have it send SIGUSR1 to its parent so the parent
knows to go wait() for it. But this isn’t ideal;
the parent will never know if the helper dies on,
say, a SIGSEGV, unless I have the helper catch
all signals – not really a good way to do it.
And it bugs me that the SIGCHLD just vanishes into
thin air when other signals still work. Anyone
got any idea what’s going on here?

I’m not sure where I got this, nor could I find it again, but I thing
wait() is NOT signal safe.

Dave Cornutt <david.k.cornutt@boeing.com> wrote:

My situation: QNX 4.25C. I have some code that forks
and execs a helper program, then it goes off and
waits for more input while the helper program runs
in the background. (The input comes through a shared-
memory queue; the program spends most of its time
on a semwait() waiting for the semaphore to tell
it that there is something in the shared memory
that it needs to process.) When the helper program
terminates, the main program needs to issue a user
message giving the termination status of the helper.
Because it needs to do this asynchronously, I used
sigaction() to set up a signal handler to catch
SIGCHLD.

There is a bug in spawn*() when passed P_WAIT as the type
of spawn (this is also caused by system() which calls spawn*()
with P_WAIT). It needs its own SIGCHLD handler, but doesn’t restore
the previous one.

This is the most likely cause of what you are seeing. So, first
thing to check, does your parent start any new processes with system()
or spawn() after your call to sigaction()?

-David

QNX Training Services
dagibbs@qnx.com

David Gibbs <dagibbs@qnx.com> wrote:

There is a bug in spawn*() when passed P_WAIT as the type
of spawn (this is also caused by system() which calls spawn*()
with P_WAIT). It needs its own SIGCHLD handler, but doesn’t restore
the previous one.

NOTE: this is a Watcom library bug, it does not occur under QNX6,
just under QNX4.

-David

QNX Training Services
dagibbs@qnx.com

Will this ever be fixed? or should a “KNOWN PROBLEM” being added to the
docs?

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:9qi64h$20n$3@nntp.qnx.com

David Gibbs <> dagibbs@qnx.com> > wrote:

There is a bug in spawn*() when passed P_WAIT as the type
of spawn (this is also caused by system() which calls spawn*()
with P_WAIT). It needs its own SIGCHLD handler, but doesn’t restore
the previous one.

NOTE: this is a Watcom library bug, it does not occur under QNX6,
just under QNX4.

-David

QNX Training Services
dagibbs@qnx.com

Brown, Richard <brownr@aecl.ca> wrote:
: Will this ever be fixed? or should a “KNOWN PROBLEM” being added to the
: docs?

I’ll add it to the docs, but I don’t know when (or if) a new version of the
docs will be released.


Steve Reid stever@qnx.com
TechPubs (Technical Publications)
QNX Software Systems

Brown, Richard <brownr@aecl.ca> wrote:

Will this ever be fixed? or should a “KNOWN PROBLEM” being added to the
docs?

There is a PR against it. I don’t know whether it will ever be fixed,
but I wouldn’t suggest holding your breath.

And, in fact, even if it IS fixed, you shouldn’t be using spawn(P_WAIT,)
or system() while trying to handle SIGCHLD as you have created a window
of failure. (Consider that you create proc1, that you want a SIGCHLD
from, then you call system() and while that is running, proc1 dies,
SIGCHLD comes to your process, the handler attached by spawn() is
called, and you’ll never see the notification since your handler doesn’t
get called.)

-David

QNX Training Services
dagibbs@qnx.com

David Gibbs <dagibbs@qnx.com> wrote:

Brown, Richard <> brownr@aecl.ca> > wrote:
Will this ever be fixed? or should a “KNOWN PROBLEM” being added to the
docs?

There is a PR against it. I don’t know whether it will ever be fixed,
but I wouldn’t suggest holding your breath.

And, in fact, even if it IS fixed, you shouldn’t be using spawn(P_WAIT,)
or system() while trying to handle SIGCHLD as you have created a window
of failure. (Consider that you create proc1, that you want a SIGCHLD
from, then you call system() and while that is running, proc1 dies,
SIGCHLD comes to your process, the handler attached by spawn() is
called, and you’ll never see the notification since your handler doesn’t
get called.)

Did some more checking – looks like the correct behaviour for spawn/system
is to mask SIGCHLD, but to not touch the handler. So, you should be ok
if they are coded correctly.

-David

QNX Training Services
dagibbs@qnx.com