how to find a running process...

Nnamdi_Kohn1 · September 9, 2004, 10:13am

Hello,

I want to implement a C function that checks if a specified process is
already running on the QNX system. Manually, this can be done using “pidin”,
but how can I use C language to check for a process and its threads?

Thanks.

Nnamdi

David_Gibbs1 · September 9, 2004, 7:26pm

Nnamdi Kohn <nnamdi.kohn@web.de> wrote:

Hello,

I want to implement a C function that checks if a specified process is
already running on the QNX system. Manually, this can be done using “pidin”,
but how can I use C language to check for a process and its threads?

The /proc filesystem has a directory for every running process, it can
be accessed using opendir(), readdir() or other standard filesystem
functions.

For getting data about running threads in a process, etc, you need to
open /proc/pid then issue a set of (undocumented) devctl()s to get that
information – take a look at <sys/procfs.h>, or the (old, out of date,
but still useful) source to pidin at cvs.qnx.com.

Other people have written/made available applications that use this
as well – Rob Krtens QNX Cookbook has a good section on using the
/proc filesystem, with some sample code that is (almost definitely)
better documented, and clearer to understand and learn from than the
pidin source.

-David

Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

Colin_Burgess1 · September 9, 2004, 9:41pm

kill( pid, 0 );

will return -1 and errno = ESRCH if there is no pid of that name.
It will do nothing if signal number is 0.

Somewhat lighter than the /proc filesystem for just checking the
existence of a known pid.

Of course, if you’re trying to find a process by name, then the /proc
filesystem is the way to go…

David Gibbs wrote:

Nnamdi Kohn <> nnamdi.kohn@web.de> > wrote:

Hello,

I want to implement a C function that checks if a specified process is
already running on the QNX system. Manually, this can be done using “pidin”,
but how can I use C language to check for a process and its threads?

The /proc filesystem has a directory for every running process, it can
be accessed using opendir(), readdir() or other standard filesystem
functions.

For getting data about running threads in a process, etc, you need to
open /proc/pid then issue a set of (undocumented) devctl()s to get that
information – take a look at <sys/procfs.h>, or the (old, out of date,
but still useful) source to pidin at cvs.qnx.com.

Other people have written/made available applications that use this
as well – Rob Krtens QNX Cookbook has a good section on using the
/proc filesystem, with some sample code that is (almost definitely)
better documented, and clearer to understand and learn from than the
pidin source.

-David

–
cburgess@qnx.com

Kevin_N1 · September 10, 2004, 1:45pm

Colin Burgess wrote:

kill( pid, 0 );

will return -1 and errno = ESRCH if there is no pid of that name.
It will do nothing if signal number is 0.

Somewhat lighter than the /proc filesystem for just checking the
existence of a known pid.

A warning: using kill(pid, 0) in conjunction with the PROCMGR_EVENT_DAEMON_DEATH pulse from Process
Manager to determine whether a process is still running can be problematic. Specifically, we found
that if the process that dies is running at a lower priority than the one receiving the pulse and
using kill(pid, 0) to check, then kill() can report that the process that just died is still running.

We had hoped to switch from using the /proc/ to kill() to determine whether processes of
interest are still running when our monitor process receives the daemon death pulse from ProcMgr.
Some timing measurements showed that the kill() call was significantly faster than accessing the
/proc filesystem. “Significantly” meaning two to three orders of magnitude.

However, because our monitor process is running at a higher priority than the processes it is
monitoring, we found that when it received the death pulse that kill(pid, 0) reported the process
that had just died was still running. We contacted QNX support and the explanation given was that
the termination thread that is cleaning up after the dead process runs at the priority of the
process that just died. Since it is running at a higher priority, the monitor process gets the pulse
and calls kill() before the termination thread is finished and thus reports the process is still alive.

Since the /proc filesystem method is works, we had to choose reliability over speed.

Of course, it would be much easier if ProcMgr sent the pid of the process that just died with the
PROCMGR_EVENT_DAEMON_DEATH pulse. Instead, the pulse arrives with no information and the monitor
process must check whether every single pid that it is monitoring is still alive or not.

K.N.

David_Gibbs1 · September 11, 2004, 4:21pm

Kevin N <xxxx@yyyy.com> wrote:

Colin Burgess wrote:
Since the /proc filesystem method is works, we had to choose reliability over speed. >

Of course, it would be much easier if ProcMgr sent the pid of the process that just died with the
PROCMGR_EVENT_DAEMON_DEATH pulse. Instead, the pulse arrives with no information and the monitor
process must check whether every single pid that it is monitoring is still alive or not.

Yeah, it would be really nice if Proc would update that pulse.value with the
pid of the process that died. Far more useful, far less overhead in dealing
with each death.
-David

–
Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

Dmitri_Poustovalov1 · September 13, 2004, 11:26am

That’s cool but it works only for programs you owe source code for.
What’s the story if a monitored program is QSS or 3rd party binary?

Colin Burgess wrote:

Why not just have the monitored programs open an fd to your monitor
program - when they die, you will get a disconnect pulse.

David Gibbs wrote:

Kevin N <> xxxx@yyyy.com> > wrote:

Colin Burgess wrote:
Since the /proc filesystem method is works, we had to choose
reliability over speed. >

Of course, it would be much easier if ProcMgr sent the pid of the
process that just died with the PROCMGR_EVENT_DAEMON_DEATH pulse.
Instead, the pulse arrives with no information and the monitor
process must check whether every single pid that it is monitoring is
still alive or not.

Yeah, it would be really nice if Proc would update that pulse.value
with the
pid of the process that died. Far more useful, far less overhead in
dealing
with each death.
-David

Colin_Burgess1 · September 13, 2004, 1:15pm

Why not just have the monitored programs open an fd to your monitor
program - when they die, you will get a disconnect pulse.

David Gibbs wrote:

Kevin N <> xxxx@yyyy.com> > wrote:

Colin Burgess wrote:
Since the /proc filesystem method is works, we had to choose reliability over speed. >

Of course, it would be much easier if ProcMgr sent the pid of the process that just died with the
PROCMGR_EVENT_DAEMON_DEATH pulse. Instead, the pulse arrives with no information and the monitor
process must check whether every single pid that it is monitoring is still alive or not.

Yeah, it would be really nice if Proc would update that pulse.value with the
pid of the process that died. Far more useful, far less overhead in dealing
with each death.
-David

–
cburgess@qnx.com

Kevin_N1 · September 13, 2004, 3:25pm

Colin Burgess wrote:

Why not just have the monitored programs open an fd to your monitor
program - when they die, you will get a disconnect pulse.

This is an option for programs for which we have the source code, but it doesn’t work for monitoring
programs that we have as binary only.

K.N.

Colin_Burgess1 · September 13, 2004, 4:08pm

Get the HAT toolkit, I believe it hooks into the dumper interface.

Dmitri Poustovalov wrote:

That’s cool but it works only for programs you owe source code for.
What’s the story if a monitored program is QSS or 3rd party binary?

Colin Burgess wrote:

Why not just have the monitored programs open an fd to your monitor
program - when they die, you will get a disconnect pulse.

David Gibbs wrote:

Kevin N <> xxxx@yyyy.com> > wrote:

Colin Burgess wrote:
Since the /proc filesystem method is works, we had to choose
reliability over speed. >

Of course, it would be much easier if ProcMgr sent the pid of the
process that just died with the PROCMGR_EVENT_DAEMON_DEATH pulse.
Instead, the pulse arrives with no information and the monitor
process must check whether every single pid that it is monitoring is
still alive or not.

Yeah, it would be really nice if Proc would update that pulse.value
with the
pid of the process that died. Far more useful, far less overhead in
dealing
with each death.
-David
\

–
cburgess@qnx.com

David_Gibbs1 · September 13, 2004, 5:44pm

Colin Burgess <cburgess@qnx.com> wrote:

Get the HAT toolkit, I believe it hooks into the dumper interface.

dumper interface only catches “abnormal” termination – ham also uses
the daemon death pulse, and walks the process table to determine which
process died in that case. And, that cost of walking the table is still
ugly and nasty.

-David

–
Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com

Colin_Burgess1 · September 13, 2004, 6:01pm

Well, I agree with it being nasty, but there are alternatives.

You can walk it earlier - there’s nothing stopping your application from
opening the /proc/pid/as connection (O_RDONLY please!!!) earlier. Then
each watched pid has a fd connection, and if its gone… then a devctl
on it will fail.

So do all the time consuming work up front, and then you only have to
check the fds that you already have.

Then you only have to worry about making sure that the monitor process
is told when new processes are started, so it can add them to it’s list.

David Gibbs wrote:

Colin Burgess <> cburgess@qnx.com> > wrote:

Get the HAT toolkit, I believe it hooks into the dumper interface.

dumper interface only catches “abnormal” termination – ham also uses
the daemon death pulse, and walks the process table to determine which
process died in that case. And, that cost of walking the table is still
ugly and nasty.

-David

–
cburgess@qnx.com

Dmitri_Poustovalov1 · September 15, 2004, 4:28pm

The problem with kill(pid,0) still remains, anyway. Meaning that pure
POSIX application can not rely on kill(). Here is a testcase to prove it;

one needs 3 programms: Dummy (prio=8), CpuHog (prio=9) and Monitor
(prio=10);
CpuHog runs in a tight loop;
Monitor checks Dummy’s state with kill(DummysPid, 0) once in a
while(), and exits when detects that Dummy is gone;
if we slay Dummy then Monitor will NOT get right state from
kill(DummysPid, 0) until we slay CpuHog.

It looks like one part of Neutrino (which sends us a death pulse and
handles devctl(), the process manager?) does know that a process has
just died. Meanwhile other part (which handles kill(), the kernel?) has
no clue. Should an OS tell us the same process state regardless of a
method we obtain it with, devctl() or kill() or anything else?

Colin Burgess wrote:

Well, I agree with it being nasty, but there are alternatives.

You can walk it earlier - there’s nothing stopping your application from
opening the /proc/pid/as connection (O_RDONLY please!!!) earlier. Then
each watched pid has a fd connection, and if its gone… then a devctl
on it will fail.

So do all the time consuming work up front, and then you only have to
check the fds that you already have.

Then you only have to worry about making sure that the monitor process
is told when new processes are started, so it can add them to it’s list.

David Gibbs wrote:

Colin Burgess <> cburgess@qnx.com> > wrote:

Get the HAT toolkit, I believe it hooks into the dumper interface.

dumper interface only catches “abnormal” termination – ham also uses
the daemon death pulse, and walks the process table to determine which
process died in that case. And, that cost of walking the table is still
ugly and nasty.

-David

Wojtek_Lerch1 · September 15, 2004, 6:14pm

Dmitri Poustovalov wrote:

It looks like one part of Neutrino (which sends us a death pulse and
handles devctl(), the process manager?) does know that a process has
just died. Meanwhile other part (which handles kill(), the kernel?) has
no clue. Should an OS tell us the same process state regardless of a
method we obtain it with, devctl() or kill() or anything else?

After a process has terminated, it can exist for a while as a zombie.
When you call kill(), it tells you whether the process still exists;
the other methods tell you whether it has terminated. Isn’t it a good
thing that you can ask one question or the other, depending on what
exactly you want to know?

Dmitri_Poustovalov1 · September 15, 2004, 7:12pm

Wojtek Lerch wrote:

Dmitri Poustovalov wrote:

It looks like one part of Neutrino (which sends us a death pulse and
handles devctl(), the process manager?) does know that a process has
just died. Meanwhile other part (which handles kill(), the kernel?)
has no clue. Should an OS tell us the same process state regardless of
a method we obtain it with, devctl() or kill() or anything else?

After a process has terminated, it can exist for a while as a zombie.
When you call kill(), it tells you whether the process still exists;
the other methods tell you whether it has terminated. Isn’t it a good
thing that you can ask one question or the other, depending on what
exactly you want to know? >

“It is not a bug it’s a feature!” Nice spin, Wojtek

The thing is there is no “one question or the other”. There is only
question “Is pid #whatever alive?” That’s “exactly” what we want to know.

Colin_Burgess1 · September 15, 2004, 7:23pm

I guess it’s still alive, just terminally ill! ;v)

Dmitri Poustovalov wrote:

Wojtek Lerch wrote:

Dmitri Poustovalov wrote:

It looks like one part of Neutrino (which sends us a death pulse and
handles devctl(), the process manager?) does know that a process has
just died. Meanwhile other part (which handles kill(), the kernel?)
has no clue. Should an OS tell us the same process state regardless
of a method we obtain it with, devctl() or kill() or anything else?

After a process has terminated, it can exist for a while as a zombie.
When you call kill(), it tells you whether the process still exists;
the other methods tell you whether it has terminated. Isn’t it a good
thing that you can ask one question or the other, depending on what
exactly you want to know? >

“It is not a bug it’s a feature!” Nice spin, Wojtek >

The thing is there is no “one question or the other”. There is only
question “Is pid #whatever alive?” That’s “exactly” what we want to know.

–
cburgess@qnx.com

Wojtek_Lerch1 · September 15, 2004, 7:56pm

Dmitri Poustovalov wrote:

“It is not a bug it’s a feature!” Nice spin, Wojtek >

I’m pretty sure that POSIX requires kill() to succeed when the proces is
a zombie. If your question is, “has this process turned into a zombie
yet”, then there’s no POSIX way to answer that question. You should be
glad that QNX has a feature that lets you detect what you want to
detect, as opposed to what kill() detects according to POSIX.

The thing is there is no “one question or the other”. There is only
question “Is pid #whatever alive?” That’s “exactly” what we want to know.

No; you have only one question. Someone else might have a different
question.

Your question is, “is this process alive”. That’s a question that
kill() doesn’t answer accurately, beacuse a zombie is a dead process.
It’s a process that exists, but is not alive. That’s why it’s called a
zombie.

And since a zombie is already dead, killing it is not an error – it’s a
no-op.

Wojtek_Lerch1 · September 15, 2004, 8:01pm

Colin Burgess wrote:

I guess it’s still alive, just terminally ill! ;v)

I imagine that if you called a terminally ill person a zombie, he’d
disagree. Or maybe even be offended.

BTW According to the POSIX definintion of “zombie”, a zombie is a
process that has terminated:

3.441 Zombie Process

A process that has terminated and that is deleted when its exit status
has been reported to another process which is waiting for that process
to terminate.

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_441

Chris_Herborth1 · September 16, 2004, 5:06pm

Wojtek Lerch wrote:

Colin Burgess wrote:

I guess it’s still alive, just terminally ill! ;v)

I imagine that if you called a terminally ill person a zombie, he’d
disagree. Or maybe even be offended.

Depends on the person… I’d go “Raaaa!” and try to bite them.

–
Chris Herborth (cherborth@qnx.com)
Never send a monster to do the work of an evil scientist.

Dmitri_Poustovalov1 · September 17, 2004, 2:00pm

Wojtek Lerch wrote:

Dmitri Poustovalov wrote:

“It is not a bug it’s a feature!” Nice spin, Wojtek >

I’m pretty sure that POSIX requires kill() to succeed when the proces is
a zombie. If your question is, “has this process turned into a zombie
yet”, then there’s no POSIX way to answer that question. You should be
glad that QNX has a feature that lets you detect what you want to
detect, as opposed to what kill() detects according to POSIX. >

The thing is there is no “one question or the other”. There is only
question “Is pid #whatever alive?” That’s “exactly” what we want to know.

No; you have only one question. Someone else might have a different
question.

That’s fine. But in order to be POSIX compliant kill(pid, 0) is supposed
to answer “is it alive?” question not “a different question”, isn’t it?

Your question is, “is this process alive”. That’s a question that
kill() doesn’t answer accurately, beacuse a zombie is a dead process.
It’s a process that exists, but is not alive. That’s why it’s called a
zombie.

I am failing to see how zombie bussiness is applicable to the testcase I
described above. There is no parent-child relationship.

If you run the testcase you would see that Dummy process was not a
zombie and pidin reported it Ready. And one can make Dummy a daemon or
use SPAWN_NOZOMBIE flag, the result is going to be the same – kill()
has no clue what Dummy’s real status is.

Sunil_Kittur1 · September 17, 2004, 3:51pm

Dmitri Poustovalov wrote:

That’s fine. But in order to be POSIX compliant kill(pid, 0) is supposed
to answer “is it alive?” question not “a different question”, isn’t it?

The POSIX spec (1003.1-2001) just says
“The null signal can be used to check the validity of pid”

The rationale in the kill() section then goes on describe
that the process lifetime definition means that kill(pid, 0)
on a zombie process will only fail for permission reasons.

Sunil.