While trying to debug another problem (system() commands that never
returned), we managed to get a process into the following state:
- The process was still running (well the pterm it was started from didn’t
return to the command prompt)
- The process was listed under /proc
- running sin (with no args) resulted in CPU usage going to 100% and all
available memory being consumed until eventually sin failed with “Out of
memory”
- psin behaved the same as sin (consumed all memory and then failed)
- pidin ran but did not show the process
- spin ran did not show the process
At this point we were a bit stumped. There were only 115 processes in the
system and there was 173MB of free memory so I don’t think we were resource
constrained.
Has anyone else seen this?
What should we do to debug this?
We are running the release version of 6.2.0 (no patches applied).
This is 100% reproducible with this particular executable, but there was
only a one-line (theoretically irrelevant) change between this executable
and another that works fine.
Rob Rutherford
Ruzz Technology
Just some slight additional information:
- both kill and slay can “find” the MIA process but neither can kill it, no
matter what signal is used
“Robert Rutherford” <ruzz@NoSpamPlease.ruzz.com> wrote in message
news:aqsgp1$r0p$1@inn.qnx.com…
While trying to debug another problem (system() commands that never
returned), we managed to get a process into the following state:
- The process was still running (well the pterm it was started from didn’t
return to the command prompt)
- The process was listed under /proc
- running sin (with no args) resulted in CPU usage going to 100% and all
available memory being consumed until eventually sin failed with “Out of
memory”
- psin behaved the same as sin (consumed all memory and then failed)
- pidin ran but did not show the process
- spin ran did not show the process
At this point we were a bit stumped. There were only 115 processes in the
system and there was 173MB of free memory so I don’t think we were
resource
constrained.
Has anyone else seen this?
What should we do to debug this?
We are running the release version of 6.2.0 (no patches applied).
This is 100% reproducible with this particular executable, but there was
only a one-line (theoretically irrelevant) change between this executable
and another that works fine.
Rob Rutherford
Ruzz Technology
I once had a similar experience of a system() call that locked up. In the
end, the stange behaviour was traced to a ConnectAttach() call elsewhere in
the same code that was being made without the _NTO_SIDE_CHANNEL flag being
set.
When the documentation for ConnectAttach says “Treating a connection as a
file descriptor can lead to unexpected behavior”, they mean the ‘Outer
Limits’ kind of unexpected…
Jim
“Robert Rutherford” <ruzz@NoSpamPlease.ruzz.com> wrote in message
news:aqsh08$r1e$1@inn.qnx.com…
Just some slight additional information:
- both kill and slay can “find” the MIA process but neither can kill it,
no
matter what signal is used
“Robert Rutherford” <> ruzz@NoSpamPlease.ruzz.com> > wrote in message
news:aqsgp1$r0p$> 1@inn.qnx.com> …
While trying to debug another problem (system() commands that never
returned), we managed to get a process into the following state:
- The process was still running (well the pterm it was started from
didn’t
return to the command prompt)
- The process was listed under /proc
- running sin (with no args) resulted in CPU usage going to 100% and all
available memory being consumed until eventually sin failed with “Out of
memory”
- psin behaved the same as sin (consumed all memory and then failed)
- pidin ran but did not show the process
- spin ran did not show the process
At this point we were a bit stumped. There were only 115 processes in
the
system and there was 173MB of free memory so I don’t think we were
resource
constrained.
Has anyone else seen this?
What should we do to debug this?
We are running the release version of 6.2.0 (no patches applied).
This is 100% reproducible with this particular executable, but there was
only a one-line (theoretically irrelevant) change between this
executable
and another that works fine.
Rob Rutherford
Ruzz Technology
\
“Jim Douglas” <jim@dramatec.co.uk> wrote in message
news:aqt0l6$dnl$1@inn.qnx.com…
When the documentation for ConnectAttach says “Treating a connection as a
file descriptor can lead to unexpected behavior”, they mean the ‘Outer
Limits’ kind of unexpected…
Jim
So then, do not attempt to adjust your computer! THEY are in complete
control.
(Sorry, couldn’t help it.)
When the documentation for ConnectAttach says “Treating a connection as a
file descriptor can lead to unexpected behavior”, they mean the ‘Outer
Limits’ kind of unexpected…
Just to explain this for people who might care…
When you don’t add in the _NTO_SIDE_CHANNEL flag you are making a connection
in the reserved file descriptor range of connections. So, when you invoke
the system command it needs to get copies of all of your file descriptors.
This causes _IO_DUP messages to be sent, and if no one is able to respond to
them (like a single threaded app that has a timer pulse) you get a lockup
as the parent waits for the child to exit and the child waits for the parent
to respond to the DUP’s.
chris
\
Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/
Thanks for all the input and explanations, but _NTO_SIDE_CHANNEL is not the
issue in this case.
Perhaps I should clarify, the main issue in this post is not the problem
with system() but rather the fact that it is possible to end up with a
“process” which is impossible to kill and which causes sin and psin to
crash.
Rob Rutherford
“Chris McKillop” <cdm@qnx.com> wrote in message
news:aqulf0$pd3$4@nntp.qnx.com…
When the documentation for ConnectAttach says “Treating a connection as
a
file descriptor can lead to unexpected behavior”, they mean the ‘Outer
Limits’ kind of unexpected…
Just to explain this for people who might care…>
When you don’t add in the _NTO_SIDE_CHANNEL flag you are making a
connection
in the reserved file descriptor range of connections. So, when you invoke
the system command it needs to get copies of all of your file descriptors.
This causes _IO_DUP messages to be sent, and if no one is able to respond
to
them (like a single threaded app that has a timer pulse) you get a lockup
as the parent waits for the child to exit and the child waits for the
parent
to respond to the DUP’s.
chris
\
Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/
Hi…
I have encountered a similar problems just this last week while testing
at a remote site. This time the process is our own resource manager or
driver that starts from rc.local. If I start the driver on the command
line, all is well, but when I start the driver from rc.local, the
behavior previously described happens. In this case, however, the
source of the problem may be something that I am doing wrong -still
investigating, but it is interesting that the behavior is the same as
the one Robert describes. (Please note that our driver runs in a x86
embedded system).
Regards…
Miguel.
Robert Rutherford wrote:
Thanks for all the input and explanations, but _NTO_SIDE_CHANNEL is not the
issue in this case.
Perhaps I should clarify, the main issue in this post is not the problem
with system() but rather the fact that it is possible to end up with a
“process” which is impossible to kill and which causes sin and psin to
crash.
Rob Rutherford