recursive run fault?

How is is possible for an app to fault more than once?
Shouldn’t an app cease running after it faults?

I have an app which generates the following trace:

Aug 28 11:27:33 Last run fault at 0005:00074EAD
Aug 28 11:27:33 Run fault 3471 000B 000D
//8/home/bedge/p4/ncc/524/main/qnx/generated/usr/darts/bin/lcbd
[snip]
//8/home/bedge/p4/ncc/524/main/qnx/generated/usr/darts/bin/lcbd
Aug 28 11:27:33 Last run fault at 0005:00074EAD
Aug 28 11:27:33 TRACE overrun node 8, 15631
overruns


Also, if I don’t kill it off soon enough, the entire box locks up.

Here’s the ver info stuff:

PROGRAM NAME VERSION DATE
sys/Proc32 Proc 4.25I Nov 25 1998
sys/Proc32 Slib16 4.23G Oct 04 1996
sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24T Feb 26 1999
/bin/Fsys Floppy 4.24B Aug 19 1997
/bin/Fsys.eide eide 4.24N Nov 18 1998
//8/bin/Dev32 Dev32 4.23G Oct 04 1996
//8/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//8/bin/Dev32.par Dev32.par 4.23G Oct 04 1996
//8/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//8/bin/Dev32.ser Dev32.ser 4.23I Jun 27 1997
//8/bin/Mouse Mouse 4.24A Aug 22 1997
//8/bin/Pipe Pipe 4.23A Feb 26 1996
//8/bin/Net Net 4.25B Jul 27 1998
//8/bin/Net.ether1000 Net.ether100 4.24B Jul 24 1998
//8//automap/automap Automap 1.40D Nov 02 1998
//8/
/usr/ucb/Socket Socket 4.25H Jul 30 1999
//8/bin/cron cron 4.23B Oct 30 1997
//8/bin/Mqueue mqueue 4.24B Jan 12 1999
//8//photon/bin/Photon Photon 1.13D Sep 03 1998
//8/
/bin/phfontpfr Photon Font 1.13A Jul 07 1998

Aside from this behavior, it’s faulting in what looks like perfectly
reasonable code,
but that’s the subject for another post…

-Bruce.

Have you attached a signal handler for SIGSEGV, which you
then return from?

Sam

Previously, Bruce Edge wrote in qdn.public.qnx4:

How is is possible for an app to fault more than once?
Shouldn’t an app cease running after it faults?

I have an app which generates the following trace:

Aug 28 11:27:33 Last run fault at 0005:00074EAD
Aug 28 11:27:33 Run fault 3471 000B 000D
//8/home/bedge/p4/ncc/524/main/qnx/generated/usr/darts/bin/lcbd
[snip]
//8/home/bedge/p4/ncc/524/main/qnx/generated/usr/darts/bin/lcbd
Aug 28 11:27:33 Last run fault at 0005:00074EAD
Aug 28 11:27:33 TRACE overrun node 8, 15631
overruns


Also, if I don’t kill it off soon enough, the entire box locks up.

Here’s the ver info stuff:

PROGRAM NAME VERSION DATE
sys/Proc32 Proc 4.25I Nov 25 1998
sys/Proc32 Slib16 4.23G Oct 04 1996
sys/Slib32 Slib32 4.24B Aug 12 1997
/bin/Fsys Fsys32 4.24T Feb 26 1999
/bin/Fsys Floppy 4.24B Aug 19 1997
/bin/Fsys.eide eide 4.24N Nov 18 1998
//8/bin/Dev32 Dev32 4.23G Oct 04 1996
//8/bin/Dev32.ansi Dev32.ansi 4.23H Nov 21 1996
//8/bin/Dev32.par Dev32.par 4.23G Oct 04 1996
//8/bin/Dev32.pty Dev32.pty 4.23G Oct 04 1996
//8/bin/Dev32.ser Dev32.ser 4.23I Jun 27 1997
//8/bin/Mouse Mouse 4.24A Aug 22 1997
//8/bin/Pipe Pipe 4.23A Feb 26 1996
//8/bin/Net Net 4.25B Jul 27 1998
//8/bin/Net.ether1000 Net.ether100 4.24B Jul 24 1998
//8//automap/automap Automap 1.40D Nov 02 1998
//8/
/usr/ucb/Socket Socket 4.25H Jul 30 1999
//8/bin/cron cron 4.23B Oct 30 1997
//8/bin/Mqueue mqueue 4.24B Jan 12 1999
//8//photon/bin/Photon Photon 1.13D Sep 03 1998
//8/
/bin/phfontpfr Photon Font 1.13A Jul 07 1998

Aside from this behavior, it’s faulting in what looks like perfectly
reasonable code,
but that’s the subject for another post…

-Bruce.


Sam Roberts (sam@cogent.ca), Cogent Real-Time Systems (www.cogent.ca)

Bruce Edge <bedge@sattel.com> wrote:

How is is possible for an app to fault more than once?

If it handles the fault in a signal handler, but doesn’t prevent
the fault from re-occuring.

Shouldn’t an app cease running after it faults?

Not if it handles the fault.

For instance, the following program:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

void sigh(int sig)
{
return;
}

void main()
{
int i;
signal(SIGFPE, sigh );

for( i = 2; i> -2; i-- )
{
printf("%d\n", 2/i );
}
}

Generates the output to the screen:

1
2

And nothing more.

But, the traceinfo looks like:

… lots of deleted events …
Aug 28 16:47:17 1 00001013 Run fault 24579 0008 0000 //60/home/dagibbs/blah
Aug 28 16:47:17 1 00001005 Last run fault at 0007:0000A045
Aug 28 16:47:17 1 00001013 Run fault 24579 0008 0000 //60/home/dagibbs/blah
Aug 28 16:47:17 1 00001005 Last run fault at 0007:0000A045
Aug 28 16:47:17 1 00001013 Run fault 24579 0008 0000 //60/home/dagibbs/blah
Aug 28 16:47:17 1 00001005 Last run fault at 0007:0000A045
Aug 28 16:47:17 1 00001013 Run fault 24579 0008 0000 //60/home/dagibbs/blah
Aug 28 16:47:17 1 00001005 Last run fault at 0007:0000A045
Aug 28 16:47:17 1 00001013 Run fault 24579 0008 0000 //60/home/dagibbs/blah
Aug 28 16:47:17 1 00001005 Last run fault at 0007:0000A045
Aug 28 16:47:17 1 00001013 Run fault 24579 0008 0000 //60/home/dagibbs/blah
Aug 28 16:47:17 1 00001005 Last run fault at 0007:0000A045
Aug 28 16:47:17 1 00001013 Run fault 24579 0008 0000 //60/home/dagibbs/blah
Aug 28 16:47:17 1 00001005 Last run fault at 0007:0000A045
Warning! 10789 overruns have occurred. Some trace events lost.

I have an app which generates the following trace:

Aug 28 11:27:33 Last run fault at 0005:00074EAD
Aug 28 11:27:33 Run fault 3471 000B 000D
//8/home/bedge/p4/ncc/524/main/qnx/generated/usr/darts/bin/lcbd
[snip]
//8/home/bedge/p4/ncc/524/main/qnx/generated/usr/darts/bin/lcbd
Aug 28 11:27:33 Last run fault at 0005:00074EAD
Aug 28 11:27:33 TRACE overrun node 8, 15631
overruns

Do you catch SIGSEGV in this application?

Also, if I don’t kill it off soon enough, the entire box locks up.

Of course, at this point the definition of “locks up” becomes of interest.
Still, you are generating an almost constant stream of processor faults…
this will not be healthy for your system.

-David

David Gibbs wrote:

Do you catch SIGSEGV in this application?

No, I don’t.

I did track down the cause though. I had 2 constructors for a class. One did
all the work, the other had a simplified arg list. I was calling the 1st
constructor from inside the second.
This is what seems to have created the object which exhibits this behavior.
Moving the actual obj init code into it’s own func which both constructors
called seems to have fixed it.
So, calling an obj constructor from within another constructor for the same
class is a no no.


Also, if I don’t kill it off soon enough, the entire box locks up.

Of course, at this point the definition of “locks up” becomes of interest.
Still, you are generating an almost constant stream of processor faults…
this will not be healthy for your system.

True, I guess only Windows truly ever locks up. Other systems become very
busy/unresponsive.
I figure if the console, X, IP and fleet interfaces are all down, then it’s
dead.

-David

Thanks for the help.

  • Bruce.

So, calling an obj constructor from within another constructor for the
same
class is a no no.

Actually this might not be a good idea for any class with class.
What if the internal object could not be allocated? There is no way to
return this fault to the higher level declaration as constuctors don’t
provide for a failed return. It might be better to declare/allocate embedded
objects in an init routine of the holder object class so that a failed
allocation/initilization can be returned.
-Paul

Paul Russell wrote:

So, calling an obj constructor from within another constructor for the
same
class is a no no.

Actually this might not be a good idea for any class with class.
What if the internal object could not be allocated? There is no way to
return this fault to the higher level declaration as constuctors don’t
provide for a failed return. It might be better to declare/allocate embedded
objects in an init routine of the holder object class so that a failed
allocation/initilization can be returned.
-Paul

Yeah, when I realized what I had done for a “real quick test” I got this
queasy feeling that it wasn’t quite kosher.
Not to mention the wierd behavior I spend 2 days tracking down.
Maybe one day I’ll learn :slight_smile:

-Bruce.