how to find a running process...

Wojtek_Lerch1 · September 17, 2004, 3:53pm

Dmitri Poustovalov wrote:

That’s fine. But in order to be POSIX compliant kill(pid, 0) is supposed
to answer “is it alive?” question not “a different question”, isn’t it?

That’s the point: it isn’t. Kill() just tells you if the pid matches an
existing proces, without making a distinction between live processes and
zombie processes.

…

I am failing to see how zombie bussiness is applicable to the testcase I
described above. There is no parent-child relationship.

The zombie business was supposed to be an obvious example of why process
termination is not an atomic thing. I guess it wasn’t that obvious
after all.

When a process terminates, a lot of things happen: fds are closed;
signals and pulses are sent; memory is unmapped; the process becomes a
zombie; its parent returns from waitpid(); the pid becomes invalid. If
you try to detect the order of those things, you shouldn’t be surprised
that they happen in a certain order. Some of them are guaranteed to
happen during termination (i.e. before the process becomes a zombie) and
some after (e.g. the child must complete its termination before
waitpid() returns in the parent). But beyond that, you won’t find many
promises in POSIX or our docs about the order of things.

Since the OS doesn’t know in general how long a process will remain in
the zombie state, it’s desirable to free up its resources before it
turns into a zombie. In particular, it seems reasonable that we do
not promise that you can access the memory of a process that has
completed its termination and turned into a zombie. Since the
/proc/pid/as entry of the process represents its address space, you
shouldn’t be surprised that it goes away sooner than the pid. That was
the main point I was trying to make with the zombie business. In
general, there’s a stage in the life cycle of a process when its pid is
still valid, and kill() tells you it’s still valid, but most of its
other resources are gone, and any API that normally lets you access them
fails.

If you run the testcase you would see that Dummy process was not a
zombie and pidin reported it Ready. And one can make Dummy a daemon or
use SPAWN_NOZOMBIE flag, the result is going to be the same – kill()
has no clue what Dummy’s real status is.

But it’s not the job of kill() to tell you the “real status” of a
process. All it tells you whether the pid is valid. In your test case,
you’re just making the transitions take indefinitely longer, which makes
it easier to notice that they don’t happen instantenously.

Dmitri_Poustovalov1 · September 20, 2004, 1:39pm

My bad. POSIX requires, kill(pid, 0) just validates the pid. QNX
implementation is such that it is POSIX compliant but due to some
reasons (I guess real-time constrains) Neutrino will postpone
termination of a low-priority program if there is something else to do.
Then a true-zombie and “terminally ill” process will look alike (no pun
intended) and effective state should be determined via /proc/as. Thanks
for clarification, Wojtek.

It is a common perception that kill(pid, 0) gives a process state. It
would be useful to add a caveat to kill(pid, 0) docs stating that
kill(pid,0) should not be used for process state detection.

Wojtek Lerch wrote:

Dmitri Poustovalov wrote:

That’s fine. But in order to be POSIX compliant kill(pid, 0) is
supposed to answer “is it alive?” question not “a different question”,
isn’t it?

That’s the point: it isn’t. Kill() just tells you if the pid matches an
existing proces, without making a distinction between live processes and
zombie processes.

…

I am failing to see how zombie bussiness is applicable to the testcase
I described above. There is no parent-child relationship.

The zombie business was supposed to be an obvious example of why process
termination is not an atomic thing. I guess it wasn’t that obvious
after all.

When a process terminates, a lot of things happen: fds are closed;
signals and pulses are sent; memory is unmapped; the process becomes a
zombie; its parent returns from waitpid(); the pid becomes invalid. If
you try to detect the order of those things, you shouldn’t be surprised
that they happen in a certain order. Some of them are guaranteed to
happen during termination (i.e. before the process becomes a zombie) and
some after (e.g. the child must complete its termination before
waitpid() returns in the parent). But beyond that, you won’t find many
promises in POSIX or our docs about the order of things.

Since the OS doesn’t know in general how long a process will remain in
the zombie state, it’s desirable to free up its resources before it
turns into a zombie. In particular, it seems reasonable that we do
not promise that you can access the memory of a process that has
completed its termination and turned into a zombie. Since the
/proc/pid/as entry of the process represents its address space, you
shouldn’t be surprised that it goes away sooner than the pid. That was
the main point I was trying to make with the zombie business. In
general, there’s a stage in the life cycle of a process when its pid is
still valid, and kill() tells you it’s still valid, but most of its
other resources are gone, and any API that normally lets you access them
fails.

If you run the testcase you would see that Dummy process was not a
zombie and pidin reported it Ready. And one can make Dummy a daemon or
use SPAWN_NOZOMBIE flag, the result is going to be the same – kill()
has no clue what Dummy’s real status is.

But it’s not the job of kill() to tell you the “real status” of a
process. All it tells you whether the pid is valid. In your test case,
you’re just making the transitions take indefinitely longer, which makes
it easier to notice that they don’t happen instantenously.

Bill_Caroselli1 · September 20, 2004, 1:58pm

In a system that is not over burdonded, i.e. occasionally has some idle
time, the kill() method essentially works. If a monitor process were to
poll that pid every second and it took an extra second to detect that the
process in question were in fact gone, so what?

If you truely need more responsive notification than that, then you need
to design a better mechinism into your application, like some kind of “I’m
Still Alive” handshaking.

Dmitri Poustovalov <pdmitri@bbbiiigggfffoooooottt.com> wrote:
DP > My bad. POSIX requires, kill(pid, 0) just validates the pid. QNX
DP > implementation is such that it is POSIX compliant but due to some
DP > reasons (I guess real-time constrains) Neutrino will postpone
DP > termination of a low-priority program if there is something else to do.
DP > Then a true-zombie and “terminally ill” process will look alike (no pun
DP > intended) and effective state should be determined via /proc/as. Thanks
DP > for clarification, Wojtek.

DP > It is a common perception that kill(pid, 0) gives a process state. It
DP > would be useful to add a caveat to kill(pid, 0) docs stating that
DP > kill(pid,0) should not be used for process state detection.

Dmitri_Poustovalov1 · September 20, 2004, 2:39pm

Bill Caroselli wrote:

In a system that is not over burdonded, i.e. occasionally has some idle
time, the kill() method essentially works. If a monitor process were to
poll that pid every second and it took an extra second to detect that the
process in question were in fact gone, so what?

If a system does not have any resemblance to high availabilty then I
tend to agree with you. Other designs might not allow mulfunctioning for
2 seconds.

If you truely need more responsive notification than that, then you need
to design a better mechinism into your application, like some kind of “I’m
Still Alive” handshaking.

Handshaking requires mutual awareness of the monitor amd a monitored
app. It is not always the case.

Death pulse in conjunction with open/devctl against /proc/as work
reliably. With some optimization(s) Colin suggested above this approach
works fast enough.

Dmitri Poustovalov <> pdmitri@bbbiiigggfffoooooottt.com> > wrote:
DP > My bad. POSIX requires, kill(pid, 0) just validates the pid. QNX
DP > implementation is such that it is POSIX compliant but due to some
DP > reasons (I guess real-time constrains) Neutrino will postpone
DP > termination of a low-priority program if there is something else to do.
DP > Then a true-zombie and “terminally ill” process will look alike (no pun
DP > intended) and effective state should be determined via /proc/as. Thanks
DP > for clarification, Wojtek.

DP > It is a common perception that kill(pid, 0) gives a process state. It
DP > would be useful to add a caveat to kill(pid, 0) docs stating that
DP > kill(pid,0) should not be used for process state detection.