Problem with fork()

I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point fork()
stops working. The machine continues to work OK in every other way, and I
can login to it after the fault starts. Other programs run fine. If I stop
my application and start it again, it runs fine (for a while). I assume my
program is exhausting some resource, but I cannot work out what. I have
tried:

‘ps’ to check that the process table is not full (it contains the normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole lot of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21432 0 64 100 64 500 125 1 1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000 106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log

“Simon Flower” <s.flower@bgs.ac.uk> wrote in message
news:b5ep8d$hdo$1@inn.qnx.com

I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point fork()
stops working. The machine continues to work OK in every other way, and I
can login to it after the fault starts. Other programs run fine. If I stop
my application and start it again, it runs fine (for a while). I assume my
program is exhausting some resource, but I cannot work out what. I have
tried:

What’s the value of errno when fork fails?

Are you reaping the zombies?

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:b5fv6o$sh3$1@inn.qnx.com

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way, and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I assume
my
program is exhausting some resource, but I cannot work out what. I have
tried:


What’s the value of errno when fork fails?

I was not originally recording errno from fork() - I have modified the code
and set it running again. It will take a day or two before the fault shows,
so I will post another message when I know the answer.

Simon.

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:b5fv6o$sh3$1@inn.qnx.com

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way, and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I assume
my
program is exhausting some resource, but I cannot work out what. I have
tried:


What’s the value of errno when fork fails?

Yes - if I was not then I would not be able to run other programs once the
fault has appeared, but I can launch a new login and run programs after the
fault in my application has commenced.

“Kevin Miller” <kevin.miller@transcore.com> wrote in message
news:b5n90n$14q$1@nntp.qnx.com

Are you reaping the zombies?

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:b5fv6o$sh3$> 1@inn.qnx.com> …

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way,
and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I
assume
my
program is exhausting some resource, but I cannot work out what. I
have
tried:


What’s the value of errno when fork fails?

\

Simon Flower <s.flower@bgs.ac.uk> wrote:

I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point fork()
stops working. The machine continues to work OK in every other way, and I
can login to it after the fault starts. Other programs run fine. If I stop
my application and start it again, it runs fine (for a while). I assume my
program is exhausting some resource, but I cannot work out what. I have
tried:

Hm… very odd. First guess would have been process table full or
something like that, but the fact that you can login, suggests that
that is not the case.

A side note – fork() & exec() is an expensive way to run a new
program under QNX – spawn*() [e.g. spawnl()] is a much better
choice, avoiding the copy of all the memory space of the parent
(which just gets immediately ditched unused by the exec() call).

When fork() fails, what errno value do you get?

What version of QNX4 are you running? (sin ver)

Can you post a complete “sin” output as well.

Another side note: where does fd 3 point, does the child need fd 5
(to the log file)? Should they be close on exec?

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

David,

I have set my application running with new debug code (including dumping the
errno when the fault occurs). It takes 1-2 days before the fault appears, so
I will post the information you ask for when the fault re-appears.

I guess that spawn*() are not part of the posix standard, or, more
importantly, are not generally available on other UNIces? I would prefer to
hang on to portability.

No the child does not need the log file, so it should be close on exec. I am
not sure what 3 points to either - is there any way of getting sin to expand
its output?

Many thanks for your help, Simon.

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:b5nct2$2ui$1@nntp.qnx.com

Simon Flower <> s.flower@bgs.ac.uk> > wrote:
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way, and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I assume
my
program is exhausting some resource, but I cannot work out what. I have
tried:

Hm… very odd. First guess would have been process table full or
something like that, but the fact that you can login, suggests that
that is not the case.

A side note – fork() & exec() is an expensive way to run a new
program under QNX – spawn*() [e.g. spawnl()] is a much better
choice, avoiding the copy of all the memory space of the parent
(which just gets immediately ditched unused by the exec() call).

When fork() fails, what errno value do you get?

What version of QNX4 are you running? (sin ver)

Can you post a complete “sin” output as well.

Another side note: where does fd 3 point, does the child need fd 5
(to the log file)? Should they be close on exec?

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Simon Flower <s.flower@bgs.ac.uk> wrote:

David,

I have set my application running with new debug code (including dumping the
errno when the fault occurs). It takes 1-2 days before the fault appears, so
I will post the information you ask for when the fault re-appears.

I guess that spawn*() are not part of the posix standard, or, more
importantly, are not generally available on other UNIces? I would prefer to
hang on to portability.

No, spawn*() are not POSIX. They, or something similar, do appear in
a variety of different places as it often makes sense to speed up the
new-program-load path (e.g. vfork() instead of fork() on some
Unices), but they are not standard.

No the child does not need the log file, so it should be close on exec. I am
not sure what 3 points to either - is there any way of getting sin to expand
its output?

No, there isn’t. Essentially, the “…” means that whoever the fd points
to does not have information about the pathname corresponding to the fd.

It is probably still Fsys, and Fsys maintains a name cache of files that
have been opened – if the path to the file that the current fd is
representing is still in the name cache, then Fsys will tell you what
file is open, but if that information is no longer in the cache, you
will see a response like what you have for fd 3.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO to
EAGAIN. I am running debugging memory allocation routines which show that my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1


Can anyone suggest other resources which could run out and cause fork() to
fail?

Simon.

“Simon Flower” <s.flower@bgs.ac.uk> wrote in message
news:b5ep8d$hdo$1@inn.qnx.com

I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point fork()
stops working. The machine continues to work OK in every other way, and I
can login to it after the fault starts. Other programs run fine. If I stop
my application and start it again, it runs fine (for a while). I assume my
program is exhausting some resource, but I cannot work out what. I have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21432 0 64 100 64 500 125 1 1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log

Other than transient failure to allocate memory for some structures or proc
entries, nothing really that can elicit an EAGAIN. Another possible area of
interesting data, might be the tracelogging. Use tracectrl to set the
severity to 7 and use tracelogger/traceinfo and examine the log. If it is a
transient memory/proc entry issue, it will get logged.

-Adam

Simon Flower <s.flower@bgs.ac.uk> wrote in message
news:b614c0$2eo$1@inn.qnx.com

I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO to
EAGAIN. I am running debugging memory allocation routines which show that
my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1


Can anyone suggest other resources which could run out and cause fork() to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way, and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I assume
my
program is exhausting some resource, but I cannot work out what. I have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a
whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole
lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21432 0 64 100 64 500 125 1 1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log
\

Simon Flower <s.flower@bgs.ac.uk> wrote:

I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO to
EAGAIN. I am running debugging memory allocation routines which show that my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the process
table.

There are a couple of QNX4 system resource monitoring tools too, try:

ftp.qnx.com:/usr/free/qnx4/os/utils/misc/sysres.gz
ftp.qnx.com:/usr/free/qnx4/os/utils/misc/os_info.tgz

-David

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1



Can anyone suggest other resources which could run out and cause fork() to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point fork()
stops working. The machine continues to work OK in every other way, and I
can login to it after the fault starts. Other programs run fine. If I stop
my application and start it again, it runs fine (for a while). I assume my
program is exhausting some resource, but I cannot work out what. I have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21432 0 64 100 64 500 125 1 1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log


QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

David Gibbs <dagibbs@qnx.com> wrote:

Simon Flower <> s.flower@bgs.ac.uk> > wrote:
I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO to
EAGAIN. I am running debugging memory allocation routines which show that my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the process
table.

There are a couple of QNX4 system resource monitoring tools too, try:

ftp.qnx.com:/usr/free/qnx4/os/utils/misc/sysres.gz
ftp.qnx.com:/usr/free/qnx4/os/utils/misc/os_info.tgz

And sysmon, http://www.parse.com/free/sysmon.html

Cheers,
-RK

-David

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1



Can anyone suggest other resources which could run out and cause fork() to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point fork()
stops working. The machine continues to work OK in every other way, and I
can login to it after the fault starts. Other programs run fine. If I stop
my application and start it again, it runs fine (for a while). I assume my
program is exhausting some resource, but I cannot work out what. I have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21432 0 64 100 64 500 125 1 1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log




QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.


Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training, Consulting and Software Products at www.parse.com.

Simon Flower <s.flower@bgs.ac.uk> wrote:

I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO to
EAGAIN. I am running debugging memory allocation routines which show that my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1

I don’t remember what all of the ‘heap’ parameters represent. But I
don’t think 3 of them are allowed to be 0. I do remember that one of
them is not longer significant. I don’t remember which one.

Anybody else think the Virtual numbers look a bit high?

“Simon Flower” <s.flower@bgs.ac.uk> wrote in message
news:b614c0$2eo$1@inn.qnx.com

I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO to
EAGAIN. I am running debugging memory allocation routines which show that
my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1


Can anyone suggest other resources which could run out and cause fork() to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way, and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I assume
my
program is exhausting some resource, but I cannot work out what. I have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a
whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole
lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21432 0 64 100 64 500 125 1 1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log
\

Yep, in fact does the process doing the fork() have a large shared memory
area?

-Adam

Ron Cococcia <ron.nospam@request.nospam.com> wrote in message
news:b62btg$gk5$1@inn.qnx.com

Anybody else think the Virtual numbers look a bit high?

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b614c0$2eo$> 1@inn.qnx.com> …
I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO
to
EAGAIN. I am running debugging memory allocation routines which show
that
my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the
process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1


Can anyone suggest other resources which could run out and cause fork()
to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way,
and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I
assume
my
program is exhausting some resource, but I cannot work out what. I
have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a
whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see
output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole
lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes
Virtual
0 0 21432 0 64 100 64 500 125 1
1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log


\

About shared memory - /dev/shmem looks like this:

nr-xr-xr-x 1 sdas gsgg 45056 Mar 04 15:05 Aplib_s11
nrw------- 1 root root 4294963200 Jan 01 1970 Physical
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_10610
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_10615
nrw-rw-r-- 1 sdas gsgg 4096 Mar 04 14:46 err_flag_1397
nrw-rw-r-- 1 sdas gsgg 4096 Mar 04 14:46 err_flag_1417
nrw-rw-r-- 1 sdas gsgg 4096 Mar 04 15:14 err_flag_1890
nrw-rw-r-- 1 sdas gsgg 4096 Mar 04 15:24 err_flag_1899
nrw-rw-r-- 1 sdas gsgg 4096 Mar 04 15:43 err_flag_1920
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 15:20 err_flag_20543
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 15:20 err_flag_20576
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_20847
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 16:26 err_flag_21455
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 16:29 err_flag_21666
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 16:31 err_flag_21690
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 16:32 err_flag_21700
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_21841
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_21886
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_21895
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 16:37 err_flag_22237
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:41 err_flag_22273
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_2413
nrw-rw-r-- 2 sdas gsgg 4096 Mar 24 16:35 err_flag_2414
nrw-rw-r-- 2 root gsgg 4096 Mar 24 16:35 err_flag_24933
nrw-rw-r-- 1 sdas gsgg 4096 Mar 04 16:25 err_flag_2515
nrw-rw-r-- 1 sdas gsgg 4096 Mar 12 15:16 err_flag_2927
nrw-rw-r-- 1 sdas gsgg 4096 Mar 14 11:40 err_flag_29399
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 16:31 err_flag_3261
nrw-rw-r-- 1 sdas gsgg 4096 Mar 24 16:35 err_flag_3679
nrw-rw-r-- 1 sdas gsgg 4096 Mar 12 15:06 err_flag_3850
nrw-rw-r-- 1 sdas gsgg 4096 Mar 12 15:08 err_flag_3893
nrw-rw-r-- 1 sdas gsgg 4096 Mar 12 15:16 err_flag_3979
nrw-rw-r-- 1 sdas gsgg 4096 Mar 12 16:27 err_flag_3998
nrw-rw-r-- 1 sdas gsgg 4096 Mar 12 16:33 err_flag_4058
nr-xr-xr-x 1 root root 434176 Mar 04 15:04 phlib_s11
nr-xr-xr-x 1 root root 53248 Mar 04 12:08 rpc_so
nrw-r–r-- 4 root gsgg 4096 Mar 24 16:35 sdas_clock_shm_name
nr-xr-xr-x 1 root root 61440 Mar 04 12:08 socket_so

My process is reading from all the err_* areas and the sdas_clock_shm_name
area - is this large - I don’t know what the Heap* headings in ‘sin info’
mean? However when I look at ‘sin info’ on a machine not running my software
I get the same values (except for Heapl which is slightly different).

Thanks for the other suggestions, I will try them out today.

Simon.

“Adam Mallory” <amallory@qnx.com> wrote in message
news:b62ged$c5f$1@nntp.qnx.com

Yep, in fact does the process doing the fork() have a large shared memory
area?

-Adam

Ron Cococcia <> ron.nospam@request.nospam.com> > wrote in message
news:b62btg$gk5$> 1@inn.qnx.com> …
Anybody else think the Virtual numbers look a bit high?

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b614c0$2eo$> 1@inn.qnx.com> …
I started my application again on Monday and finally it stopped
working
today. I can still log in, run other programs, etc. Not all my calls
to
fork() are failing - some work, some fail - all the failures set ERRNO
to
EAGAIN. I am running debugging memory allocation routines which show
that
my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the
process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes
Virtual
0 0 21624 0 64 100 64 500 125 1
2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1


Can anyone suggest other resources which could run out and cause
fork()
to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on
a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way,
and
I
can login to it after the fault starts. Other programs run fine. If
I
stop
my application and start it again, it runs fine (for a while). I
assume
my
program is exhausting some resource, but I cannot work out what. I
have
tried:

‘ps’ to check that the process table is not full (it contains
the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a
whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see
output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a
whole
lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes
Virtual
0 0 21432 0 64 100 64 500 125 1
1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log




\

Adam,

I tried (as root):

tracectrl -s 7

Then traceinfo. My application is producing fork() requests every 10 seconds
(or thereabouts). I don’t see any message in the trace log for these forks -
I do see logs from Spawn() every time I run a program (traceinfo for
example). I am surprised that a failed fork() does not produce some trace
logging - am I doing something wrong?

Simon.

“Adam Mallory” <amallory@qnx.com> wrote in message
news:b61qmg$rnc$1@nntp.qnx.com

Other than transient failure to allocate memory for some structures or
proc
entries, nothing really that can elicit an EAGAIN. Another possible area
of
interesting data, might be the tracelogging. Use tracectrl to set the
severity to 7 and use tracelogger/traceinfo and examine the log. If it is
a
transient memory/proc entry issue, it will get logged.

-Adam

Simon Flower <> s.flower@bgs.ac.uk> > wrote in message
news:b614c0$2eo$> 1@inn.qnx.com> …
I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO
to
EAGAIN. I am running debugging memory allocation routines which show
that
my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the
process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1


Can anyone suggest other resources which could run out and cause fork()
to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way,
and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I
assume
my
program is exhausting some resource, but I cannot work out what. I
have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a
whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see
output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole
lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes
Virtual
0 0 21432 0 64 100 64 500 125 1
1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log


\

Bill Caroselli <qtps@earthlink.net> wrote:

Simon Flower <> s.flower@bgs.ac.uk> > wrote:

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1

I don’t remember what all of the ‘heap’ parameters represent. But I
don’t think 3 of them are allowed to be 0. I do remember that one of
them is not longer significant. I don’t remember which one.

Pre-QNX 4.23, that would have been the case. With 4.23 and later,
they became meaningless, so were posted as all 0.

(Pre 4.23, Proc was a 16-bit program, and certain data areas were
limitted to one 64k segment (a-piece) with 4.23 and beyond, Proc
was 32-bit and that limit went away.)

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Ron Cococcia <ron.nospam@request.nospam.com> wrote:

Anybody else think the Virtual numbers look a bit high?

Yep. I looked at them myself and thought they look quite high,
especially for that small of an amount of physical memory.

-David

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b614c0$2eo$> 1@inn.qnx.com> …
I started my application again on Monday and finally it stopped working
today. I can still log in, run other programs, etc. Not all my calls to
fork() are failing - some work, some fail - all the failures set ERRNO to
EAGAIN. I am running debugging memory allocation routines which show that
my
application uses between 34k and 37k of dynamic memory.

So I don’t think the error is due to lack of memory or filling the process
table.

sin info looks like this:

Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39440k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21624 0 64 100 64 500 125 1 2562M/
3674M

Boot from Hard at Dec 05 04:43 Locators: 1


Can anyone suggest other resources which could run out and cause fork() to
fail?

Simon.

“Simon Flower” <> s.flower@bgs.ac.uk> > wrote in message
news:b5ep8d$hdo$> 1@inn.qnx.com> …
I have an application which calls ping (using fork() and exec()) on a
regular basis. The application runs for a few days, at which point
fork()
stops working. The machine continues to work OK in every other way, and
I
can login to it after the fault starts. Other programs run fine. If I
stop
my application and start it again, it runs fine (for a while). I assume
my
program is exhausting some resource, but I cannot work out what. I have
tried:

‘ps’ to check that the process table is not full (it contains the
normal
number of processes, around 50)
sin info to check that amount of free memory for the system as a
whole
(see output below) - seems OK
sin -p memory to check the memory from the process (see output
below) - is this OK, I’m not sure?
sin -p fds to check that the program is not keeping a whole
lot
of
files open (see below) - seems OK

Any suggestions as to where to look for other things that might stop
fork()
running would be very welcome.

Many thanks, Simon Flower
British Geological Survey.


*** Output from sin info:
Node CPU Machine Speed Memory Ticksize Display
Flags
1 586/587 PCI 12823 39034k/64090k 1.0ms VGA
Color -3P±---------8P

Heapp Heapf Heapl Heapn Hands Names Sessions Procs Timers Nodes Virtual
0 0 21432 0 64 100 64 500 125 1 1623M/
3674M

Boot from Hard at Dec 27 23:36 Locators: 1


*** Output from sin -p memory:
PROGRAM PID
//1//sdas_monitor 29399
0007 48930000 143360 -B-3--------DC- 000F 48930000
106496 -B-3-----------


*** Output from sin -p fds:
PROGRAM PID
//1/
/sdas_monitor 29399
0 -//1/dev/con1
1 -//1/dev/con1
2 -//1/dev/con1
3 -//1/…
4C-//1/dev/shmem/err_flag_29399
5 -//1/home/sdas/log/uptime.log

\


QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.