Hardware signals, PtAppAddSignalProc(), program lockup

Setup: A Photon app that uses PtAppAddSignalProc() to add a handler for
several
signals, including the “hardware-generated” signals SIGFPE and SIGSEGV. The
app
has a debug menu that allows me to cause the errors that result in these
signals – e.g., division by zero, null pointer assignment.

The Problem: When a hardware-generated error occurs, the app locks up.
Running ‘sin -Pmyapp sig’ multiple times shows the app “SIG MASK” and “SIG
PEND” continually changing – for example:

SIG MASK SIG PEND
00000000 00000080
00000080 00000000
00000000 00000000
00000080 00000000
00000000 00000080

What Works: The signal handler does what it is supposed to do when I send it
signals externally using ‘slay’, even for SIGFPE and SIGSEGV.

I haven’t tried to capture signals the way that you are suggesting, but in
our app we assign a signal handling function to a signal using the signal()
call.

For example this is one of the first lines in the initialization code, which
gets called as our Photon app starts:

signal(SIGSEGV, vSegvSignalHandler);

The function vSegvSignalHandler() is defined as follows:

// vSegvSignalHandler
//
// signal handler that services the SEGV signal (segmentation fault)
void vSegvSignalHandler(int nSignalNumber)
{
// if the signal is a segmentation fault, then create file, and log error
if (nSignalNumber == SIGSEGV)
{
FILE *fpSegFile; // file pointer to create Seg file
char szFilename[100]; // filename to create
struct tm *ptmCurrentTime; // time structure

// create a file to signal that we have had a segmentation fault
ptmCurrentTime = ptmLocal();
sprintf(szFilename, “/9020/segv/SegDISP_%02d:%02d:%02d_%02d_%02d_%d”,
ptmCurrentTime->tm_hour, ptmCurrentTime->tm_min,
ptmCurrentTime->tm_sec,
ptmCurrentTime->tm_mday, ptmCurrentTime->tm_mon,
ptmCurrentTime->tm_year + (int)1900);

// create segmentation fault file
fpSegFile = fopen(szFilename, “w+”);
fclose(fpSegFile);

// see if we can log an error
syslog(LOG_CRIT, “DISPLAY SIGSEGV”);
vWriteResetFile();
}

// signal to the parent that we are exiting
{
pid_t pidParent = getppid();
kill(pidParent, SIGQUIT);
}

// exit with an error
exit(1);
}

We write the file and log the error so we try and chase any problems. We
also have a manager function that initially called the Photon app, and it
handles exiting and rebooting and the rest.

Hope that helps.

Rodney Gullickson

“Dale Sherwood” <dsherwoo@nyab.com> wrote in message
news:9qnf6b$t1t$1@inn.qnx.com

Setup: A Photon app that uses PtAppAddSignalProc() to add a handler for
several
signals, including the “hardware-generated” signals SIGFPE and SIGSEGV.
The
app
has a debug menu that allows me to cause the errors that result in these
signals – e.g., division by zero, null pointer assignment.

The Problem: When a hardware-generated error occurs, the app locks up.
Running ‘sin -Pmyapp sig’ multiple times shows the app “SIG MASK” and “SIG
PEND” continually changing – for example:

SIG MASK SIG PEND
00000000 00000080
00000080 00000000
00000000 00000000
00000080 00000000
00000000 00000080

What Works: The signal handler does what it is supposed to do when I send
it
signals externally using ‘slay’, even for SIGFPE and SIGSEGV.

Dale Sherwood <dsherwoo@nyab.com> wrote:

Setup: A Photon app that uses PtAppAddSignalProc() to add a handler for
several
signals, including the “hardware-generated” signals SIGFPE and SIGSEGV. The
app
has a debug menu that allows me to cause the errors that result in these
signals – e.g., division by zero, null pointer assignment.

The Problem: When a hardware-generated error occurs, the app locks up.

The Photon signal catching mechanism is not meant to handle this kind of
situations. Just attach a regular signal handler using signal() or
sigaction().

Make sure that your handler does not return. If you know exactly why
you’re getting the SIGSEGV or whatever, you could try using siglongp()
to jump out of the signal handler into a safe place. Otherwise, the
only safe thing to do is call _exit() (or raise()) to terminate your
application.


Wojtek Lerch QNX Software Systems Ltd.

Rodney Gullickson <rodneyg@tritro.com.au> wrote:

I haven’t tried to capture signals the way that you are suggesting, but in
our app we assign a signal handling function to a signal using the signal()
call.

For example this is one of the first lines in the initialization code, which
gets called as our Photon app starts:

signal(SIGSEGV, vSegvSignalHandler);

The function vSegvSignalHandler() is defined as follows:

Your signal handler uses a lot of functions that are not signal-safe.
It may work most of the time, but it’s not guaranteed to always work.

// vSegvSignalHandler
//
// signal handler that services the SEGV signal (segmentation fault)
void vSegvSignalHandler(int nSignalNumber)
{
// if the signal is a segmentation fault, then create file, and log error
if (nSignalNumber == SIGSEGV)
{
FILE *fpSegFile; // file pointer to create Seg file
char szFilename[100]; // filename to create
struct tm *ptmCurrentTime; // time structure

// create a file to signal that we have had a segmentation fault
ptmCurrentTime = ptmLocal();
sprintf(szFilename, “/9020/segv/SegDISP_%02d:%02d:%02d_%02d_%02d_%d”,
ptmCurrentTime->tm_hour, ptmCurrentTime->tm_min,
ptmCurrentTime->tm_sec,
ptmCurrentTime->tm_mday, ptmCurrentTime->tm_mon,
ptmCurrentTime->tm_year + (int)1900);

// create segmentation fault file
fpSegFile = fopen(szFilename, “w+”);
fclose(fpSegFile);

// see if we can log an error
syslog(LOG_CRIT, “DISPLAY SIGSEGV”);
vWriteResetFile();
}

// signal to the parent that we are exiting
{
pid_t pidParent = getppid();
kill(pidParent, SIGQUIT);
}

// exit with an error
exit(1);
}


Wojtek Lerch QNX Software Systems Ltd.

One cannot reliably do much useful stuff in a normal signal handler
(or an interrupt service routine) because all but the most trivial
library functions are non-reentrant. This means that they are not
signal-safe (or interrupt-safe). (A signal like SIGSEGV causes
a hardware interrupt, which, like any interrupt, is asynchronous.)

According to “Advanced Programming in the UNIX Environment”
(Stevens, Addison-Wesley, 1993), POSIX.1 specifies that certain
functions must be reentrant. Included among these are open()
and write() – but not fopen() or sprintf() or syslog() or kill(), which
you are using (at your own peril).

However, Watcom 10.6 does not look POSIX-compliant in this
regard – useful functions like open() and write() are not among
the reentrant functions. (I could use qsort() if I wanted to, but it
is hard to imagine how that might be useful :slight_smile:

I had hoped that the Photon PtAppAddSignalProc() function
would provide a work-around for this signal handling dilemma.
However, this turns not to be the case (see the responses of
Wojtek Lerch in this thread).

->>>–Dale Sherwood–> New York Air Brake Corp., TDS Group

Rodney Gullickson wrote in message <9qnn8u$4ei$1@inn.qnx.com>…

I haven’t tried to capture signals the way that you are suggesting, but in
our app we assign a signal handling function to a signal using the signal()
call.

For example this is one of the first lines in the initialization code,
which
gets called as our Photon app starts:

signal(SIGSEGV, vSegvSignalHandler);

The function vSegvSignalHandler() is defined as follows:

// vSegvSignalHandler
//
// signal handler that services the SEGV signal (segmentation fault)
void vSegvSignalHandler(int nSignalNumber)
{
// if the signal is a segmentation fault, then create file, and log error
if (nSignalNumber == SIGSEGV)
{
FILE *fpSegFile; // file pointer to create Seg
file
char szFilename[100]; // filename to create
struct tm *ptmCurrentTime; // time structure

// create a file to signal that we have had a segmentation fault
ptmCurrentTime = ptmLocal();
sprintf(szFilename, “/9020/segv/SegDISP_%02d:%02d:%02d_%02d_%02d_%d”,
ptmCurrentTime->tm_hour, ptmCurrentTime->tm_min,
ptmCurrentTime->tm_sec,
ptmCurrentTime->tm_mday, ptmCurrentTime->tm_mon,
ptmCurrentTime->tm_year + (int)1900);

// create segmentation fault file
fpSegFile = fopen(szFilename, “w+”);
fclose(fpSegFile);

// see if we can log an error
syslog(LOG_CRIT, “DISPLAY SIGSEGV”);
vWriteResetFile();
}

// signal to the parent that we are exiting
{
pid_t pidParent = getppid();
kill(pidParent, SIGQUIT);
}

// exit with an error
exit(1);
}

We write the file and log the error so we try and chase any problems. We
also have a manager function that initially called the Photon app, and it
handles exiting and rebooting and the rest.

Hope that helps.

Rodney Gullickson

“Dale Sherwood” <> dsherwoo@nyab.com> > wrote in message
news:9qnf6b$t1t$> 1@inn.qnx.com> …
Setup: A Photon app that uses PtAppAddSignalProc() to add a handler for
several
signals, including the “hardware-generated” signals SIGFPE and SIGSEGV.
The
app
has a debug menu that allows me to cause the errors that result in these
signals – e.g., division by zero, null pointer assignment.

The Problem: When a hardware-generated error occurs, the app locks up.
Running ‘sin -Pmyapp sig’ multiple times shows the app “SIG MASK” and
“SIG
PEND” continually changing – for example:

SIG MASK SIG PEND
00000000 00000080
00000080 00000000
00000000 00000000
00000080 00000000
00000000 00000080

What Works: The signal handler does what it is supposed to do when I send
it
signals externally using ‘slay’, even for SIGFPE and SIGSEGV.

\

Dale Sherwood <dsherwoo@nyab.com> wrote:

One cannot reliably do much useful stuff in a normal signal handler
(or an interrupt service routine) because all but the most trivial
library functions are non-reentrant. This means that they are not
signal-safe (or interrupt-safe). (A signal like SIGSEGV causes
a hardware interrupt, which, like any interrupt, is asynchronous.)

It happens because you try to execute an invalid operation, when you
try to execute it. What’s asynchronous about that?

And it’s not even really a hardware interrupt: the word “exception” is
often used to distinguish these cases from real hardware interrupts.

According to “Advanced Programming in the UNIX Environment”
(Stevens, Addison-Wesley, 1993), POSIX.1 specifies that certain
functions must be reentrant. Included among these are open()
and write() – but not fopen() or sprintf() or syslog() or kill(), which
you are using (at your own peril).

Does Stevens actually use the word “reentrant”? AFAIK, the term used in
the POSIX standard is “async-signal safe”.

The way the Watcom docs use the term, “reentrant” is not the same as
“signal-safe”. A signal-safe function (like write()) may fail if it’s
interrupted by a signal; a reentrant function is not affected by signals
at all.

The functions that POSIX requires to be signal-safe are signal-safe in
QNX. It’s just that the Watcom docs don’t have a copy of that list in
them…

However, Watcom 10.6 does not look POSIX-compliant in this
regard – useful functions like open() and write() are not among
the reentrant functions. (I could use qsort() if I wanted to, but it
is hard to imagine how that might be useful > :slight_smile:

They are not reentrant, but they are signal-safe.

I had hoped that the Photon PtAppAddSignalProc() function
would provide a work-around for this signal handling dilemma.
However, this turns not to be the case (see the responses of
Wojtek Lerch in this thread).

The purpose of PtAppAddSignalProc() is to let you handle an asynchronous
singal, possibly in several unrelated handler functions (useful for
things like SIGCHLD) and keep going. If you know that you only attach
one signal handler to a particular signal, and that your handler exits,
then there’s nothing wrong with using signal() or sigaction(). And that
is the only way to handle synchronous signals that make returning from
the signal handler unsafe, like SIGSEGV or SIGFPE.


Wojtek Lerch QNX Software Systems Ltd.