STOPPED state

Normally when a NULL pointer is referenced, a segment violation occurs
(with the appropriate signal) and the process shuts down. Also normally,
when things go wrong in an ISR, it will bring the whole machine to its
knees.

I inadvertantly today referenced a NULL pointer from within an ISR, and
the process didn’t terminate. Nor did the OS crash. The process had
become immortal and psin indicated that that its state was “STOPPED”.
What is “STOPPED” and how to you “unstop” it short of a reboot? Raising
a SIGKILL as root is of no use. Interestingly, it consumed no CPU, and
in no way affected the overall operation of the machine. But
unfortunately only one instance of this process can exists (at the
moment) so I had to hit the button. Again.

Regards,

Geoff Roberts.

When a process is STOPPED you need to hit it with a SIGCONT to get it going
again. This is how the debugger controls its debugees.

cheers,

Kris

“Geoff” <geoff@rtts.com.au> wrote in message
news:3DE712AD.24D13F10@rtts.com.au

Normally when a NULL pointer is referenced, a segment violation occurs
(with the appropriate signal) and the process shuts down. Also normally,
when things go wrong in an ISR, it will bring the whole machine to its
knees.

I inadvertantly today referenced a NULL pointer from within an ISR, and
the process didn’t terminate. Nor did the OS crash. The process had
become immortal and psin indicated that that its state was “STOPPED”.
What is “STOPPED” and how to you “unstop” it short of a reboot? Raising
a SIGKILL as root is of no use. Interestingly, it consumed no CPU, and
in no way affected the overall operation of the machine. But
unfortunately only one instance of this process can exists (at the
moment) so I had to hit the button. Again.

Regards,

Geoff Roberts.

It’s possible that dumper is creating a dump file…

Kris Warkentin <kewarken@qnx.com> wrote:

When a process is STOPPED you need to hit it with a SIGCONT to get it going
again. This is how the debugger controls its debugees.

cheers,

Kris

“Geoff” <> geoff@rtts.com.au> > wrote in message
news:> 3DE712AD.24D13F10@rtts.com.au> …
Normally when a NULL pointer is referenced, a segment violation occurs
(with the appropriate signal) and the process shuts down. Also normally,
when things go wrong in an ISR, it will bring the whole machine to its
knees.

I inadvertantly today referenced a NULL pointer from within an ISR, and
the process didn’t terminate. Nor did the OS crash. The process had
become immortal and psin indicated that that its state was “STOPPED”.
What is “STOPPED” and how to you “unstop” it short of a reboot? Raising
a SIGKILL as root is of no use. Interestingly, it consumed no CPU, and
in no way affected the overall operation of the machine. But
unfortunately only one instance of this process can exists (at the
moment) so I had to hit the button. Again.

Regards,

Geoff Roberts.


cburgess@qnx.com

“Colin Burgess” <cburgess@qnx.com> wrote in message
news:as85vr$l5n$1@nntp.qnx.com

It’s possible that dumper is creating a dump file…

Ah…good point. Big executable image on a slow filesystem…it would be
held stopped for quite a while wouldn’t it?

cheers,

Kris

In fact, I’m going to change pidin to notice that the process flags have
_NTO_PF_COREDUMP set, and display DUMPING instead of STOPPED. :v)

Kris Warkentin <kewarken@qnx.com> wrote:

“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:as85vr$l5n$> 1@nntp.qnx.com> …
It’s possible that dumper is creating a dump file…

Ah…good point. Big executable image on a slow filesystem…it would be
held stopped for quite a while wouldn’t it?

cheers,

Kris


cburgess@qnx.com

I tried SIGCONT. It didn’t help.

It later occured to me that this process was rather like a zombie, only
given a different name/state, and perhaps waiting for a wait call.

I don’t think it was dumping at all: the disk was not thrashing and
apart from consuming a process slot or three, didn’t appear to affect
anything else.

Geoff.

Kris Warkentin wrote:

When a process is STOPPED you need to hit it with a SIGCONT to get it going
again. This is how the debugger controls its debugees.

cheers,

Kris

“Geoff” <> geoff@rtts.com.au> > wrote in message
news:> 3DE712AD.24D13F10@rtts.com.au> …
Normally when a NULL pointer is referenced, a segment violation occurs
(with the appropriate signal) and the process shuts down. Also normally,
when things go wrong in an ISR, it will bring the whole machine to its
knees.

I inadvertantly today referenced a NULL pointer from within an ISR, and
the process didn’t terminate. Nor did the OS crash. The process had
become immortal and psin indicated that that its state was “STOPPED”.
What is “STOPPED” and how to you “unstop” it short of a reboot? Raising
a SIGKILL as root is of no use. Interestingly, it consumed no CPU, and
in no way affected the overall operation of the machine. But
unfortunately only one instance of this process can exists (at the
moment) so I had to hit the button. Again.

Regards,

Geoff Roberts.

Realtime Technology Systems Pty Ltd
2 Hadleigh Circuit
Isabella Plains
ACT 2905
AUSTRALIA

Phone: 61-2-6291 3833
Fax: 61-2-6291 3838
Mobile: 0413 634 667
Email: geoff@rtts.com.au

“Colin Burgess” <cburgess@qnx.com> wrote in message
news:as8d02$pbi$1@nntp.qnx.com

In fact, I’m going to change pidin to notice that the process flags have
_NTO_PF_COREDUMP set, and display DUMPING instead of STOPPED. :v)

And if the filesystem is full, you can have it display CONSTIPATED right?

:wink:

Kris

“Kris Warkentin” <kewarken@qnx.com> wrote in message
news:as8jhr$t0o$1@nntp.qnx.com

“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:as8d02$pbi$> 1@nntp.qnx.com> …
In fact, I’m going to change pidin to notice that the process flags have
_NTO_PF_COREDUMP set, and display DUMPING instead of STOPPED. :v)

And if the filesystem is full, you can have it display CONSTIPATED right?

:wink:

Yeah, and please display SCREWEDUP for the processes that are getting hit by
those charming IO_DUP/CLOSE messages due to the lack of _NTO_SIDE_CHANNEL on
some connection… would be more informative than showing it as SEND-blocked
on itself.

– igor “who thinks the person who decided to not make _NTO_SIDE_CHANNEL the
default should be sentenced to writing documentation for the rest of his
life”.

Geoff wrote:

I tried SIGCONT. It didn’t help.

It later occured to me that this process was rather like a zombie, only
given a different name/state, and perhaps waiting for a wait call.

I don’t think it was dumping at all: the disk was not thrashing and
apart from consuming a process slot or three, didn’t appear to affect
anything else.

To add to this; I am currently looking at a j9 with 4 STOPPED threads.
These threads have been STOPPED for over 30 minutes, with no hard disk
activity. This process is most definately not dumping.

This happens frequently (several times a day) for me (although it is
not unique to j9, io-net has also done the same thing. My hardware:

  • Tyan Tiger MP S2460 with Dual Athlon Palomino 1.2Ghz processors
  • 256 MB registered ECC memory.
  • EIDE hard disk.

Also, less frequently, (perhaps once per day, instead of several
times per day) the OS will completely freeze (no toggling num
lock), necessitating a reboot (perhaps this is when one of procs
threads becomes STOPPED ?). Interestingly, if I run the single
processor kernel on this identical hardware, I can run for months
without rebooting (yes I have actually done this - in fact I
just recently re-enabled multi-processor in order to test a MT
project that I am working on - which incidently, lead me to find
that strings in the QSSL supplied libstdc++ library are not
thread safe - although I googled a patch that fixes this - one
more reason to go to gcc 3.X IMO).

QNX SMP definately has problems, as soon as I can verify that
my project works under SMP, I am going back to uniprocessor,
so I can get some work done.

Rennie

To add to this; I am currently looking at a j9 with 4 STOPPED threads.
These threads have been STOPPED for over 30 minutes, with no hard disk
activity. This process is most definately not dumping.

What version of j9?

that strings in the QSSL supplied libstdc++ library are not
thread safe - although I googled a patch that fixes this - one
more reason to go to gcc 3.X IMO).

Yeah, and is the fact that gcc 3.x can generate bad code with -O2 a
good enough reason to not use it (very common issues on the vim lists)? :wink:
It took years to get gcc 2 to the stable point that it is, and given that
every release of gcc v3 has claimed that “the C++ ABI is now stable”
(while breaking compt. with the last release of v3), I think v3 still needs
a little more time to cook.

Besides, we provide libcpp and libecpp, which are far more compliant then GNU.

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Chris McKillop wrote:

What version of j9?

Here is the output of “j9 --version”

Licensed Materials - Property of IBM

J9 - VM for the Java™ platform, Version 1.5
(c) Copyright IBM Corp. 1991, 2002 All Rights Reserved
Target: 20020206 (QNX 6.2.0 x86)

IBM is a registered trademark of IBM Corp.
Java and all Java-based marks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc.

Usage: j9 [options] classname [args…]
Usage: j9 [options] -jxe: [args…]

[options]
-jxe: run the named jxe file.
-jxespace:,,
map memory region for jxes, (values are in hex).
-jxeaddr:
run a jxe directly from memory, (address is in hex).
-cp: set classpath to .
-D= set the value of a system property.
-debug: start a JDWP debug server on .
-jcl:[:options]
specify which JCL DLL to use (e.g. max, xtr, …).
-verbose[:class,gc,dynload,stack,debug]
enable verbose output (default=class,gc).
-verify enable class file verification.
-X print help on non-standard options.


As I said, I don’t believe this has anything to do with j9 (io-net
has done the same), it only happens to multi-threaded apps (I have
never seen a single threaded app do this).

One thing I will say, however, is that I have disabled the
“CPU load monitor”, and so far my system seems way more
stable (although there hasn’t been enough elapsed time to be
sure yet). Is there a possibility that the CPU load monitor
de-stabilizes SMP systems ?

that strings in the QSSL supplied libstdc++ library are not
thread safe - although I googled a patch that fixes this - one
more reason to go to gcc 3.X IMO).


Yeah, and is the fact that gcc 3.x can generate bad code with -O2 a
good enough reason to not use it (very common issues on the vim lists)? > :wink:

No. g++ v2.95 also generates bad code with -O3. I’d rather have
something that works correctly slowly than something that just
doesn’t work at all (under SMP at least). At least then I can
test my code, and know it works, as gcc improves (or as people
use different - compliant - C++ compilers), the speed of my code
improves (without having to touch my code), I am perfectly cool
with this.

I think there should be a big disclaimer on the NC download page,
that says “YOU CANNOT DEVELOP MT C++ CODE WITH THIS” and then;
“…unless you forego use of the standard library”. Right now
(without patching) you can’t even safely instantiate strings in
separate threads (that’s pretty basic functionality)…

It took years to get gcc 2 to the stable point that it is, and given that
every release of gcc v3 has claimed that “the C++ ABI is now stable”
(while breaking compt. with the last release of v3), I think v3 still needs
a little more time to cook.

I agree, until the ABI stabilizes, there is no point. As soon as the
ABI is stable, though, I think we need something that actually works
(even if only at -O0).

Besides, we provide libcpp and libecpp, which are far more compliant then GNU.

Yeah, but I’m writing software to give away, at home on NC, and I don’t
have access to these, do I ?

btw: How well has the dinkum stuff been tested on SMP ?

Rennie

Rennie Allen wrote:

Chris McKillop wrote:

What version of j9?


One thing I will say, however, is that I have disabled the
“CPU load monitor”, and so far my system seems way more
stable (although there hasn’t been enough elapsed time to be
sure yet). Is there a possibility that the CPU load monitor
de-stabilizes SMP systems ?

No such luck. j9 just developed (2 minutes ago) 6 STOPPED threads.

4 hours is a record though (2 hours is average).

Rennie

– igor “who thinks the person who decided to not make _NTO_SIDE_CHANNEL
the
default should be sentenced to writing documentation for the rest of his
life”.

My sentiments exactly…(seriously)

Jim

Rennie Allen <rgallen@attbi.com> wrote:

Rennie Allen wrote:
Chris McKillop wrote:

What version of j9?


One thing I will say, however, is that I have disabled the
“CPU load monitor”, and so far my system seems way more
stable (although there hasn’t been enough elapsed time to be
sure yet). Is there a possibility that the CPU load monitor
de-stabilizes SMP systems ?

No such luck. j9 just developed (2 minutes ago) 6 STOPPED threads.

4 hours is a record though (2 hours is average).

Are you using j9 to drive Eclipse? If so, disable the jit.

eclipse -vmargs -nojit

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

And how about BLOATED for processes that have allocated too much memory? ;v)

Igor Kovalenko <kovalenko@attbi.com> wrote:

“Kris Warkentin” <> kewarken@qnx.com> > wrote in message
news:as8jhr$t0o$> 1@nntp.qnx.com> …
“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:as8d02$pbi$> 1@nntp.qnx.com> …
In fact, I’m going to change pidin to notice that the process flags have
_NTO_PF_COREDUMP set, and display DUMPING instead of STOPPED. :v)

And if the filesystem is full, you can have it display CONSTIPATED right?

:wink:

Yeah, and please display SCREWEDUP for the processes that are getting hit by
those charming IO_DUP/CLOSE messages due to the lack of _NTO_SIDE_CHANNEL on
some connection… would be more informative than showing it as SEND-blocked
on itself.

– igor “who thinks the person who decided to not make _NTO_SIDE_CHANNEL the
default should be sentenced to writing documentation for the rest of his
life”.


cburgess@qnx.com

REPLY → ANXIOUS
RECEIVE → LONELY
NANOSLEEP → SIESTA
READY → CANYOUSEEWHATICANDO?AREYOUWATCHING?LOOKATMENOW!YOURNOTWATCHINGAREYOU?

E.


Colin Burgess <cburgess@qnx.com> wrote:

And how about BLOATED for processes that have allocated too much memory? ;v)

Igor Kovalenko <> kovalenko@attbi.com> > wrote:
“Kris Warkentin” <> kewarken@qnx.com> > wrote in message
news:as8jhr$t0o$> 1@nntp.qnx.com> …
“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:as8d02$pbi$> 1@nntp.qnx.com> …
In fact, I’m going to change pidin to notice that the process flags have
_NTO_PF_COREDUMP set, and display DUMPING instead of STOPPED. :v)

And if the filesystem is full, you can have it display CONSTIPATED right?

:wink:

Yeah, and please display SCREWEDUP for the processes that are getting hit by
those charming IO_DUP/CLOSE messages due to the lack of _NTO_SIDE_CHANNEL on
some connection… would be more informative than showing it as SEND-blocked
on itself.

– igor “who thinks the person who decided to not make _NTO_SIDE_CHANNEL the
default should be sentenced to writing documentation for the rest of his
life”.


cburgess@qnx.com

Chris McKillop wrote:

Are you using j9 to drive Eclipse? If so, disable the jit.

eclipse -vmargs -nojit

Sounds like you know something (btw: I’m back at work so I can’t try
this until Friday). If you could enlighten me as to what effect
disabling the jit would have on the vm (besides slowing it down :slight_smile:,
that might prevent threads getting into an unslayable STOPPED state, I’d
be mighty appreciative.

Rennie

Sounds like you know something (btw: I’m back at work so I can’t try
this until Friday). If you could enlighten me as to what effect
disabling the jit would have on the vm (besides slowing it down > :slight_smile:> ,
that might prevent threads getting into an unslayable STOPPED state, I’d
be mighty appreciative.

Actually, oddly enough, disabling the jit makes it faster. :wink: There
are some places where it will get slower (like compiling java code), but
there are pretty high latencies and startup costs with the jit in j9 v1.5.
These make the UI performance worse and startup time longer and we have
disabled it for 6.2.1.

There are also bugs where the jvm can get itself into a bad state with the
jit enabled. Been narrowed down by QSS and fixed by OTI already (for j9
v2.0) but still exist with j9 v1.5. Happens most often with network
API calls, but has also exhibited itself in other odd behavior.

chris

\

Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Chris McKillop wrote:

Actually, oddly enough, disabling the jit makes it faster. > :wink:

Makes it faster… nojit… :slight_smile:

There are also bugs where the jvm can get itself into a bad state with the
jit enabled. Been narrowed down by QSS and fixed by OTI already (for j9
v2.0) but still exist with j9 v1.5. Happens most often with network
API calls, but has also exhibited itself in other odd behavior.

Yeah, this makes sense, I was able to actually kill j9 once by
slaying io-net.

Thanks for the info.

Rennie

Hi Chris…

For eclipse running rtp6.2.0-PE, could I update the j9 to v2.0? If so,
how? Thanks.

Regards…

Miguel.


Chris McKillop wrote:

Sounds like you know something (btw: I’m back at work so I can’t try
this until Friday). If you could enlighten me as to what effect
disabling the jit would have on the vm (besides slowing it down > :slight_smile:> ,
that might prevent threads getting into an unslayable STOPPED state, I’d
be mighty appreciative.



Actually, oddly enough, disabling the jit makes it faster. > :wink: > There
are some places where it will get slower (like compiling java code), but
there are pretty high latencies and startup costs with the jit in j9 v1.5.
These make the UI performance worse and startup time longer and we have
disabled it for 6.2.1.

There are also bugs where the jvm can get itself into a bad state with the
jit enabled. Been narrowed down by QSS and fixed by OTI already (for j9
v2.0) but still exist with j9 v1.5. Happens most often with network
API calls, but has also exhibited itself in other odd behavior.

chris