Async notification of other processes termination

Tangent question from this original thread.

You indicated while using the _PPF_INFORM, you can not fork/spawn new
children. This is the exact problem I’m having with a process I wrote that
monitors for specific process death and restarts them. Under some conditions
the spawn works, yet some don’t (long delay before return). Seems to be
related to Permission. Can you explain the behavior and/or the reason for
the statement as the manuals have no indication?

Thanks in advance…

“Steve Higgs” <shiggs68@hotmail.com> wrote in message
news:bntiem$n7i$1@inn.qnx.com

Tangent question from this original thread.

You indicated while using the _PPF_INFORM, you can not fork/spawn new
children. This is the exact problem I’m having with a process I wrote that
monitors for specific process death and restarts them. Under some
conditions
the spawn works, yet some don’t (long delay before return). Seems to be
related to Permission. Can you explain the behavior and/or the reason for
the statement as the manuals have no indication?

I think the logic is simple. With PPF_INFORM you receive a message from the
kernel when a process is spawn. Hence if the process doing the spawning is
set to PPF_INFORM it will receive a message but it can’t handle it since the
proc will dead lock sending the message to it.

The fork/spawn restriction only apply to process that have set the
_PPF_INFORM flag and is not system wide.

The process handling the inform message should be very high priority because
if it gets slow down by other higher priority program it will slow down
spawning, the part of code handling spawning in the kernel is the same that
sends the inform message. While process manager (term is more precise then
kernek…) is reply block on a process it can’t spawn.




Thanks in advance…

Steve Higgs <shiggs68@hotmail.com> wrote:

Tangent question from this original thread.

You indicated while using the _PPF_INFORM, you can not fork/spawn new
children. This is the exact problem I’m having with a process I wrote that
monitors for specific process death and restarts them. Under some conditions
the spawn works, yet some don’t (long delay before return). Seems to be
related to Permission. Can you explain the behavior and/or the reason for
the statement as the manuals have no indication?

So, basically there is one (let’s call it a) thread in Proc that handles
the creation/termination of processes. When you want to create a process,
fork/spawn send it a message, and block until it completes and replies.
When a process is dieing, this same creation/termination thread sends
a message to all the _PPF_INFORM processes, and does not continue until
each, in turn, has replied to it. (Which is why you should reply to the
death message as quickly as possible.)

If, while you are trying to spawn/fork, the termer thread is trying to
send you a message, you have created a deadlock situation. (No forward
process can be achieved.) Eventually a timeout happens, and the Proc
“thread” gets ripped-out of whatever it was doing, and life continues,
but after a delay (and with the possibility of lost notifications, etc.)

So, the usual work-around, is that you need a spawner agent. Before your
“monitor” process sets _PPF_INFORM, it should fork/spawn a child process
that will Send() it a “I’m waiting” message. Anytime you want to re-start
something, Reply() to the agent, it will do the spawn(), then send you
an “I’m done” message (or, a “it failed to start” message, or whatever).
Keep it blocked (don’t reply) until next time you need to spawn/fork
something. Since Reply() is non-blocking this works. (You can’t Send()
to the agent, since that doesn’t prevent the deadlock – it just becomes
a three-process deadlock ring, rather than a two-process deadlock.)

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Clear as mud and thanks for the response.

I actually toggle the _PPF_INFORM off and back on around the spawn call then
check to see that I didn’t miss any termination while inside the spawn as my
work around.

Again thanks. It helps when you understand…
“David Gibbs” <dagibbs@qnx.com> wrote in message
news:bntvvd$gv9$1@nntp.qnx.com

Steve Higgs <> shiggs68@hotmail.com> > wrote:
Tangent question from this original thread.

You indicated while using the _PPF_INFORM, you can not fork/spawn new
children. This is the exact problem I’m having with a process I wrote
that
monitors for specific process death and restarts them. Under some
conditions
the spawn works, yet some don’t (long delay before return). Seems to be
related to Permission. Can you explain the behavior and/or the reason
for
the statement as the manuals have no indication?

So, basically there is one (let’s call it a) thread in Proc that handles
the creation/termination of processes. When you want to create a process,
fork/spawn send it a message, and block until it completes and replies.
When a process is dieing, this same creation/termination thread sends
a message to all the _PPF_INFORM processes, and does not continue until
each, in turn, has replied to it. (Which is why you should reply to the
death message as quickly as possible.)

If, while you are trying to spawn/fork, the termer thread is trying to
send you a message, you have created a deadlock situation. (No forward
process can be achieved.) Eventually a timeout happens, and the Proc
“thread” gets ripped-out of whatever it was doing, and life continues,
but after a delay (and with the possibility of lost notifications, etc.)

So, the usual work-around, is that you need a spawner agent. Before your
“monitor” process sets _PPF_INFORM, it should fork/spawn a child process
that will Send() it a “I’m waiting” message. Anytime you want to re-start
something, Reply() to the agent, it will do the spawn(), then send you
an “I’m done” message (or, a “it failed to start” message, or whatever).
Keep it blocked (don’t reply) until next time you need to spawn/fork
something. Since Reply() is non-blocking this works. (You can’t Send()
to the agent, since that doesn’t prevent the deadlock – it just becomes
a three-process deadlock ring, rather than a two-process deadlock.)

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Steve Higgs <shiggs68@hotmail.com> wrote:

Clear as mud and thanks for the response.

I actually toggle the _PPF_INFORM off and back on around the spawn call
then check to see that I didn’t miss any termination while inside the
spawn as my work around.

After you do the qnx_pflags() to turn off _PPF_INFORM, do you do a
Creceive() to make sure Proc32 isn’t already blocked on you? If not,
you’ve got a race-condition that could still result in dead lock.

-David


QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.