sh and zombie

I’ve setup an embedded system with custom boot file and startup procedure.

There is a script that starts a bunch of our application. That script
is invoke manually and once all application have started the prompt
comes back.

If one of the program that is started by that script is slayed it becomes
zombies.

I’ve made a small test with the following script named zorro:

dummy &

Dummy is a program that does a sleep(100000), for testing purpuses

So from the shell I do:

#zorro
#pidin

#slay dummy
#pidin
#…

I can see that dummy is now in zombie state.

What have I missed?

  • Mario

I’ve looked at this problem before but it doesn’t always seem to show up.
What version of QNX are you running and what shell? Does your dummy
program do anything else? I was seeing this behaviour in some cases when
the program called by the script was doing resmgr_* type calls.

cheers,

Kris

Mario Charest <mcharest@clipzinformatic.com> wrote:

I’ve setup an embedded system with custom boot file and startup procedure.

There is a script that starts a bunch of our application. That script
is invoke manually and once all application have started the prompt
comes back.

If one of the program that is started by that script is slayed it becomes
zombies.

I’ve made a small test with the following script named zorro:

dummy &

Dummy is a program that does a sleep(100000), for testing purpuses

So from the shell I do:

#zorro
#pidin

#slay dummy
#pidin
#…

I can see that dummy is now in zombie state.

What have I missed?

  • Mario




    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about telescopes”
    –E.W.Dijkstra

“Kris Eric Warkentin” <kewarken@qnx.com> wrote in message
news:9tgrtq$oso$1@nntp.qnx.com

I’ve looked at this problem before but it doesn’t always seem to show up.
What version of QNX are you running and what shell?

QNX 6.1 and I’m running ksh

Does your dummy program do anything else?

#include <stdlib.h>

int main(void) {
sleep(100000);
return 0;
}

If you need something else just ask. I’m really eager to solve this.

I was seeing this behaviour in some cases when
the program called by the script was doing resmgr_* type calls.

cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I’ve setup an embedded system with custom boot file and startup
procedure.

There is a script that starts a bunch of our application. That script
is invoke manually and once all application have started the prompt
comes back.

If one of the program that is started by that script is slayed it
becomes
zombies.

I’ve made a small test with the following script named zorro:

dummy &

Dummy is a program that does a sleep(100000), for testing purpuses

So from the shell I do:

#zorro
#pidin

#slay dummy
#pidin
#…

I can see that dummy is now in zombie state.

What have I missed?

  • Mario







    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra

Is your script also as simple as:

#!/bin/sh
…/dummy &

as well?

I just tried it on my 6.1.1 box and it doesn’t seem to do it.

ren:/export/home/kewarken/test/ksh >cat dummy.c
int
main(void)
{
sleep(10000);
return 0;
}
ren:/export/home/kewarken/test/ksh >cat zorro
#!/bin/sh
…/dummy &
ren:/export/home/kewarken/test/ksh >zorro
ren:/export/home/kewarken/test/ksh >pidin | grep dummy
165679151 1 ./dummy 10r NANOSLEEP
ren:/export/home/kewarken/test/ksh >slay dummy
ren:/export/home/kewarken/test/ksh >pidin | grep dummy
ren:/export/home/kewarken/test/ksh >pidin | grep Zom
ren:/export/home/kewarken/test/ksh >

I don’t know what to say. I’ve had intermittant reports of weird behaviour
in this area on several occasions but had trouble finding a reproducable
(on every box) case. It may be that something subtle has been changed in the
way proc launches processes that has caused it to go away. If you can still
make it happen after 6.1.1 comes out, we’ll have to look at it again.

cheers,

Kris

Mario Charest <mcharest@clipzinformatic.com> wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tgrtq$oso$> 1@nntp.qnx.com> …
I’ve looked at this problem before but it doesn’t always seem to show up.
What version of QNX are you running and what shell?

QNX 6.1 and I’m running ksh

Does your dummy program do anything else?

#include <stdlib.h

int main(void) {
sleep(100000);
return 0;
}

If you need something else just ask. I’m really eager to solve this.

I was seeing this behaviour in some cases when
the program called by the script was doing resmgr_* type calls.

cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I’ve setup an embedded system with custom boot file and startup
procedure.

There is a script that starts a bunch of our application. That script
is invoke manually and once all application have started the prompt
comes back.

If one of the program that is started by that script is slayed it
becomes
zombies.

I’ve made a small test with the following script named zorro:

dummy &

Dummy is a program that does a sleep(100000), for testing purpuses

So from the shell I do:

#zorro
#pidin

#slay dummy
#pidin
#…

I can see that dummy is now in zombie state.

What have I missed?

  • Mario







    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra


Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“Computer science is no more about computers than astronomy is about telescopes”
–E.W.Dijkstra

“Kris Eric Warkentin” <kewarken@qnx.com> wrote in message
news:9tgv6j$rgg$1@nntp.qnx.com

Is your script also as simple as:

#!/bin/sh
./dummy &

as well?

Yes exactly that.

I just tried it on my 6.1.1 box and it doesn’t seem to do it.

It doesn’t do it on our 6.1.0 server, it only does it on the embedded
(custom boot) 6.1.0 machine.


I don’t know what to say. I’ve had intermittant reports of weird
behaviour
in this area on several occasions but had trouble finding a reproducable
(on every box) case. It may be that something subtle has been changed in
the
way proc launches processes that has caused it to go away. If you can
still
make it happen after 6.1.1 comes out, we’ll have to look at it again.

Can’t wait for 6.1.1, I’ll do some more digging.

What is probably happening is that for some reason the sh is
not getting SIGCHLD ? I’ll start from there.

Thanks


cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tgrtq$oso$> 1@nntp.qnx.com> …
I’ve looked at this problem before but it doesn’t always seem to show
up.
What version of QNX are you running and what shell?

QNX 6.1 and I’m running ksh

Does your dummy program do anything else?

#include <stdlib.h

int main(void) {
sleep(100000);
return 0;
}

If you need something else just ask. I’m really eager to solve this.

I was seeing this behaviour in some cases when
the program called by the script was doing resmgr_* type calls.

cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I’ve setup an embedded system with custom boot file and startup
procedure.

There is a script that starts a bunch of our application. That
script
is invoke manually and once all application have started the prompt
comes back.

If one of the program that is started by that script is slayed it
becomes
zombies.

I’ve made a small test with the following script named zorro:

dummy &

Dummy is a program that does a sleep(100000), for testing purpuses

So from the shell I do:

#zorro
#pidin

#slay dummy
#pidin
#…

I can see that dummy is now in zombie state.

What have I missed?

  • Mario







    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra




    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra

I found the cause of the problem but I don’t understand
the rational behind it.

The boot scrips look like:


reopen /dev/con4
[+session pri=10] sh -c /etc/rc.d/rc.local

reopen /dev/con1

}



The problem was the last reopen statement.
It obvioulsy serves no purpose and shouldn’t
have been there in the first place

That being said I don’t understand why it
ended creating such a problem.

  • Mario “1 down, 1 billion to go” Charest

The shell registers a SIGCHLD handler but the shell which spawned dummy has
already exited by the time you slay it. At this point the process has been
inherited by procnto so you would think that you would get a zombie wouldn’t
you? The shell isn’t doing anything special either (ie. spawning with
NOZOMBIE set). I just tried this testcase on a ppcbe board with no result.
Do you have a specific board I might be able to see it on?

cheers,

Kris


Mario Charest <mcharest@clipzinformatic.com> wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tgv6j$rgg$> 1@nntp.qnx.com> …
Is your script also as simple as:

#!/bin/sh
./dummy &

as well?

Yes exactly that.


I just tried it on my 6.1.1 box and it doesn’t seem to do it.


It doesn’t do it on our 6.1.0 server, it only does it on the embedded
(custom boot) 6.1.0 machine.



I don’t know what to say. I’ve had intermittant reports of weird
behaviour
in this area on several occasions but had trouble finding a reproducable
(on every box) case. It may be that something subtle has been changed in
the
way proc launches processes that has caused it to go away. If you can
still
make it happen after 6.1.1 comes out, we’ll have to look at it again.


Can’t wait for 6.1.1, I’ll do some more digging.

What is probably happening is that for some reason the sh is
not getting SIGCHLD ? I’ll start from there.

Thanks



cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tgrtq$oso$> 1@nntp.qnx.com> …
I’ve looked at this problem before but it doesn’t always seem to show
up.
What version of QNX are you running and what shell?

QNX 6.1 and I’m running ksh

Does your dummy program do anything else?

#include <stdlib.h

int main(void) {
sleep(100000);
return 0;
}

If you need something else just ask. I’m really eager to solve this.

I was seeing this behaviour in some cases when
the program called by the script was doing resmgr_* type calls.

cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I’ve setup an embedded system with custom boot file and startup
procedure.

There is a script that starts a bunch of our application. That
script
is invoke manually and once all application have started the prompt
comes back.

If one of the program that is started by that script is slayed it
becomes
zombies.

I’ve made a small test with the following script named zorro:

dummy &

Dummy is a program that does a sleep(100000), for testing purpuses

So from the shell I do:

#zorro
#pidin

#slay dummy
#pidin
#…

I can see that dummy is now in zombie state.

What have I missed?

  • Mario







    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra




    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra


Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“Computer science is no more about computers than astronomy is about telescopes”
–E.W.Dijkstra

It seems that procnto is pretty good about reaping its zombies in general.
This one must be slipping through a hole somewhere. From what I understand,
you get a zombie if a process dies BEFORE its parent and its parent fails
to wait() for it. ie.

/* make a zombie */
int
main(void)
{
int i;

if((i = fork())){
/* parent /
sleep(5);
}
/
child exits right away */
return 0;
}

There have been a few outstanding bugs regarding Zombies fixed recently so
maybe it’s gone now.

cheers,

Kris

Kris Eric Warkentin <kewarken@qnx.com> wrote:

The shell registers a SIGCHLD handler but the shell which spawned dummy has
already exited by the time you slay it. At this point the process has been
inherited by procnto so you would think that you would get a zombie wouldn’t
you? The shell isn’t doing anything special either (ie. spawning with
NOZOMBIE set). I just tried this testcase on a ppcbe board with no result.
Do you have a specific board I might be able to see it on?

cheers,

Kris



Mario Charest <> mcharest@clipzinformatic.com> > wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tgv6j$rgg$> 1@nntp.qnx.com> …
Is your script also as simple as:

#!/bin/sh
./dummy &

as well?

Yes exactly that.


I just tried it on my 6.1.1 box and it doesn’t seem to do it.


It doesn’t do it on our 6.1.0 server, it only does it on the embedded
(custom boot) 6.1.0 machine.



I don’t know what to say. I’ve had intermittant reports of weird
behaviour
in this area on several occasions but had trouble finding a reproducable
(on every box) case. It may be that something subtle has been changed in
the
way proc launches processes that has caused it to go away. If you can
still
make it happen after 6.1.1 comes out, we’ll have to look at it again.


Can’t wait for 6.1.1, I’ll do some more digging.

What is probably happening is that for some reason the sh is
not getting SIGCHLD ? I’ll start from there.

Thanks



cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tgrtq$oso$> 1@nntp.qnx.com> …
I’ve looked at this problem before but it doesn’t always seem to show
up.
What version of QNX are you running and what shell?

QNX 6.1 and I’m running ksh

Does your dummy program do anything else?

#include <stdlib.h

int main(void) {
sleep(100000);
return 0;
}

If you need something else just ask. I’m really eager to solve this.

I was seeing this behaviour in some cases when
the program called by the script was doing resmgr_* type calls.

cheers,

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I’ve setup an embedded system with custom boot file and startup
procedure.

There is a script that starts a bunch of our application. That
script
is invoke manually and once all application have started the prompt
comes back.

If one of the program that is started by that script is slayed it
becomes
zombies.

I’ve made a small test with the following script named zorro:

dummy &

Dummy is a program that does a sleep(100000), for testing purpuses

So from the shell I do:

#zorro
#pidin

#slay dummy
#pidin
#…

I can see that dummy is now in zombie state.

What have I missed?

  • Mario







    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra




    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra




    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about telescopes”
    –E.W.Dijkstra


Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“Computer science is no more about computers than astronomy is about telescopes”
–E.W.Dijkstra

Are you saying that if you take out the reopen statement, the zombie problem
goes away?

Kris

Mario Charest <mcharest@clipzinformatic.com> wrote:

I found the cause of the problem but I don’t understand
the rational behind it.

The boot scrips look like:


reopen /dev/con4
[+session pri=10] sh -c /etc/rc.d/rc.local

reopen /dev/con1

}




The problem was the last reopen statement.
It obvioulsy serves no purpose and shouldn’t
have been there in the first place

That being said I don’t understand why it
ended creating such a problem.

  • Mario “1 down, 1 billion to go” Charest


Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“Computer science is no more about computers than astronomy is about telescopes”
–E.W.Dijkstra

“Kris Eric Warkentin” <kewarken@qnx.com> wrote in message
news:9tj6oc$ahn$2@nntp.qnx.com

Are you saying that if you take out the reopen statement, the zombie
problem
goes away?

Yes! I’ve double check.

When I get back home I’ll do the same to my machine to see if
it behaves the same on a normal PC. I’m at a customer site
right now and don’t have pc available to try this right now aside
the embedded system.


Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I found the cause of the problem but I don’t understand
the rational behind it.

The boot scrips look like:


reopen /dev/con4
[+session pri=10] sh -c /etc/rc.d/rc.local

reopen /dev/con1

}



The problem was the last reopen statement.
It obvioulsy serves no purpose and shouldn’t
have been there in the first place

That being said I don’t understand why it
ended creating such a problem.

  • Mario “1 down, 1 billion to go” Charest





    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra

Let me know what you discover. I’m trying to replicate this on a 6.1.1 box
right now. Perhaps its the combination of the script and the reopen? What
is happening in the rc.local script?

Kris

Mario Charest <mcharest@clipzinformatic.com> wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tj6oc$ahn$> 2@nntp.qnx.com> …
Are you saying that if you take out the reopen statement, the zombie
problem
goes away?

Yes! I’ve double check.

When I get back home I’ll do the same to my machine to see if
it behaves the same on a normal PC. I’m at a customer site
right now and don’t have pc available to try this right now aside
the embedded system.



Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I found the cause of the problem but I don’t understand
the rational behind it.

The boot scrips look like:


reopen /dev/con4
[+session pri=10] sh -c /etc/rc.d/rc.local

reopen /dev/con1

}



The problem was the last reopen statement.
It obvioulsy serves no purpose and shouldn’t
have been there in the first place

That being said I don’t understand why it
ended creating such a problem.

  • Mario “1 down, 1 billion to go” Charest





    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra


Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“Computer science is no more about computers than astronomy is about telescopes”
–E.W.Dijkstra

“Kris Eric Warkentin” <kewarken@qnx.com> wrote in message
news:9tjc6b$dii$1@nntp.qnx.com

Let me know what you discover. I’m trying to replicate this on a 6.1.1
box
right now. Perhaps its the combination of the script and the reopen?
What
is happening in the rc.local script?

Can’t replicate it on a normal box, grrrr

Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

“Kris Eric Warkentin” <> kewarken@qnx.com> > wrote in message
news:9tj6oc$ahn$> 2@nntp.qnx.com> …
Are you saying that if you take out the reopen statement, the zombie
problem
goes away?

Yes! I’ve double check.

When I get back home I’ll do the same to my machine to see if
it behaves the same on a normal PC. I’m at a customer site
right now and don’t have pc available to try this right now aside
the embedded system.



Kris

Mario Charest <> mcharest@clipzinformatic.com> > wrote:

I found the cause of the problem but I don’t understand
the rational behind it.

The boot scrips look like:


reopen /dev/con4
[+session pri=10] sh -c /etc/rc.d/rc.local

reopen /dev/con1

}



The problem was the last reopen statement.
It obvioulsy serves no purpose and shouldn’t
have been there in the first place

That being said I don’t understand why it
ended creating such a problem.

  • Mario “1 down, 1 billion to go” Charest





    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra




    Kris Warkentin
    kewarken@qnx.com
    (613)591-0836 x9368
    “Computer science is no more about computers than astronomy is about
    telescopes”
    –E.W.Dijkstra