SLOW proc creation thread (QNX4)

Rennie · June 24, 2003, 9:39pm

More info.

Setting FD_CLOEXEC on the fd that is open to my resource manager in the
spawnee solves the problem (i.e. the spawnee can then spawn without
problems).

The only problem here is, I don’t understand why having/not having
FD_CLOEXEC on the fd should affect correct operation. I don’t get a
dup(), (log messages in the resmgr confirm that), and even if I did, it
should be correctly handled.

Looks like Ian was on the right track with his initial comment; but I
still don’t understand why. Since dup doesn’t appear to be involved.

Can anyone explain what is happening here ?

Rennie

Adam_Mallory1 · June 25, 2003, 3:49pm

Rennie <rallen@csical.com> wrote in message
news:bdagev$el$1@tiger.openqnx.com…

Setting FD_CLOEXEC on the fd that is open to my resource manager in the
spawnee solves the problem (i.e. the spawnee can then spawn without
problems).

The only problem here is, I don’t understand why having/not having
FD_CLOEXEC on the fd should affect correct operation. I don’t get a
dup(), (log messages in the resmgr confirm that), and even if I did, it
should be correctly handled.

On inheritance of FDs (which are also across spawn) we deliver IO_DUP
messages.

Looks like Ian was on the right track with his initial comment; but I
still don’t understand why. Since dup doesn’t appear to be involved.

Well if you have process flags that say your spawned process can now handle
messages or is expecting death info, you can get deadlocked in a send (doc’d
in qnx_pflags() I think).

-Adam

John_Garvey1 · June 25, 2003, 5:02pm

Rennie <rallen@csical.com> wrote:

Randomly, Proc will take forever (30 seconds) to spawn a process.

30 secs is the termer thread timeout (doing IO_CLOSE). Do you get
“kick thread” messages appearing on the system console (I don’t
remember if they also get placed in tracelog)? From vague memory
this situation happens when an INFORMed server tries to fork/spawn …

Adam_Mallory1 · June 25, 2003, 6:41pm

John Garvey <jgarvey@node184.ott.qnx.com> wrote in message
news:bdckja$gqk$1@nntp.qnx.com…

30 secs is the termer thread timeout (doing IO_CLOSE). Do you get
“kick thread” messages appearing on the system console (I don’t
remember if they also get placed in tracelog)? From vague memory
this situation happens when an INFORMed server tries to fork/spawn …

Well the dead loader/termer time is in play regardless of the state (ie. in
the process of doing close()). “Unable to kill kernel thread” is what would
display on the console if the state of the person stuck was READY. From his
tracelogs he is getting kicked, but since the process is SEND blocked, we
just force him ready, and he’s out.

Informed processes, which spawn can cause trouble as Proc can become send
blocked.

-Adam

Ian_Zagorskih1 · June 26, 2003, 1:17am

Rennie wrote:

More info.

Setting FD_CLOEXEC on the fd that is open to my resource manager in the
spawnee solves the problem (i.e. the spawnee can then spawn without
problems).

The only problem here is, I don’t understand why having/not having
FD_CLOEXEC on the fd should affect correct operation. I don’t get a
dup(), (log messages in the resmgr confirm that), and even if I did, it
should be correctly handled.

Looks like Ian was on the right track with his initial comment; but I
still don’t understand why. Since dup doesn’t appear to be involved.

Can anyone explain what is happening here ?

It is hard to guess when i don’t have the complete log of messages received
by RM

// wbr

David_Gibbs1 · June 26, 2003, 9:49pm

Rennie <rallen@csical.com> wrote:

I 've been having an odd problem, and I found this old thread (from
1994) that describes it almost verbatim. The only difference between
my problem and this problem, is that I don’t necessarily need to call
spawn with an invalid filename.

Randomly, Proc will take forever (30 seconds) to spawn a process.
The spawn call returns quickly (P_NOWAIT) but, the system is frozen
(prio 29 shell cannot “sin”). I have a resource manager which
registers a prefix, and spawn()s programs which do an open on the
prefix. When a program dies, I receive a close and do a targetted
wait pid to obtain the status. It works just spiffy most of the
time, but every once in a while there is this long process load.

I’m late to this thread – but I’m just gonna quickly post that
an iomanager/resource manager is not allowed to create children.

Essentially, this could result in a deadlock with Proc, there is
one “thread” in Proc that handles process creation/termination, as
an iomanager, it sends you close messages, if you also send it a
creation message, well, you get that deadlock. There is, also, I think
a timeout on closes, which will kick the termer thread onto the next
one, and 30 seconds sounds about right for that.

Usual solution – create a “starter” agent process that (mostly) stays
REPLY blocked on your iomanager, if you need to start something, Reply()
with the info, it starts the program, then Send()s to you again.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.