Kevin Miller <kevin.miller@transcore.com> wrote:
There is a cron task that does an rtc -s hw once per minute. Another cron
task runs once at midnight each day to create a new syslog file. tftpd is
invoked a few times per day. The systems are expected to run unattended for
weeks or months at a time, so this adds up, I suppose.
It’s the rtc every minute that adds up – the others are pretty small
load compared to that – which is 1440 process creations/day.
There is a bug in the process creation algorithm, where it can (rarely)
create a pid that is the process group leader for an already existing
process group. When this process then exits, all the processes in that
process group will get a SIGHUP. I don’t know if this can affect
session 1 processes or not – but maybe.
I’d go about addressing this in two stages:
-
reduce the number of process creations. Running rtc every minute is
very heavy handed – I’d recommend instead getting the source to rtc from
ftp.qnx.com:/usr/free/qnx4/os/samples/misc/rtc_src.tgz and modifying this
to use a timer and wakeup & do the rtc work every minute that way – rather
than running rtc every minute. This will greatly extend the time before
the pid creation algorithm will recreate the dangerous pid. (And, it
will also be good for your system, greatly reducing total system load
by not creating & destroying a process every minute. Far better.) In
doing the recoding, you might also consider qnx_adj_time() for resynching
the clock, rather than clock_settime(). (Though, rtc might do that
already, I’m not sure.)
-
Inquire through your sales or support rep about getting a fixed Proc
that does not have this bug. (Have the sales or support rep talk to
Adam Mallory about such a fix.)
I had considered nohup, and we may end up using it, but it would be nice if
we knew what was causing the signal. Is there anyway to determine the issuer
of a signal?
Not sure. Setting trace verbosity very high, and getting the traceinfo
right after it happens might give enough information to make a guess,
but I’m not sure if that info is recorded there.
Or, there is a signal context that comes with a signal, with a signal
handler for SIGHUP, you could get this context and dump out the signal.
Of course, that only works if it is your programs going boom, not ones
you don’t have code for.
[sigcontext stuff, from an old post]
From steve@qnx.com Tue Sep 16 15:58:01 EDT 1997
Article: 6374 of comp.os.qnx
Path: gateway.qnx.com!not-for-mail
From: steve@qnx.com (Steve McPolin)
Newsgroups: comp.os.qnx
Subject: Re: Getting signal context
Date: 6 Aug 1997 15:01:08 GMT
Organization: QNX Software Systems
Lines: 107
Message-ID: <5sa3jk$oc9@qnx.com>
References: <u4zpqvn1jj.fsf@stlind.lint.lyngso-industri.dk.lyngso-industri.dk>
NNTP-Posting-Host: gateway.qnx.com
X-Newsreader: trn 4.0-test58 (13 May 97)
In article <u4zpqvn1jj.fsf@stlind.lint.lyngso-industri.dk.lyngso-industri.dk>,
Jeppe Sommer <jso@stlind.lint.lyngso-industri.dk.lyngso-industri.dk> wrote:
Is there a way of getting the signal context from within a signal
handler in QNX (i.e., the state of the CPU just before the signal is
delivered)?.
On other (Unix like) systems this is typically available as an extra
argument to the signal handler.
Looking at the signal handlers stack from within the Watcom debugger,
it seems that most of the CPU registers are in fact placed further
down the stack (I am compiling with stack calling conventions).
Unfortunatly these do not seem to be at a fixed distance to the
stack top.
Does anyone have a clue about how this could be done? I am primarily
interested in the instruction pointer register.
The structure below defines it, it is available as the second argument
but the compiler is AR about calling signal with a function which
doesn’t match ‘void (*)(int)’ – you can cast it away to void *.
example:
#include <signal.h>
#include <sys/sigcontext.h> /* follows if you don’t have it in your system */
void catch(int signo, SIGCONTEXT scp)
{
printf(“death by signal %u at 0x%lx\n”, signo, scp->sc_ip);
exit(1);
}
The fields:
ulong_t sc_info; / fault specific info /
ushort_t sc_errc; / error code pushed by processor /
uchar_t sc_fault; / actual fault # /
uchar_t sc_flags; / signal handler flags: */
are only available in 424 and above, as is the sigaltstack() et al…
#ifndef sigcont_h
#define sigcont_h 1
#ifndef __TYPES_H_INCLUDED
#include <sys/types.h>
#endif
typedef struct _sigcontext SIGCONTEXT;
struct _sigcontext {
ulong_t sc_mask;
ulong_t sc_gs:16,:16; /* register set at fault time /
ulong_t sc_fs:16,:16;
ulong_t sc_es:16,:16;
ulong_t sc_ds:16,:16;
ulong_t sc_di;
ulong_t sc_si;
ulong_t sc_bp;
ulong_t :32; / hole from pushad /
ulong_t sc_bx;
ulong_t sc_dx;
ulong_t sc_cx;
ulong_t sc_ax;
ulong_t sc_ip;
ulong_t sc_cs:16, :16;
ulong_t sc_fl;
ulong_t sc_sp;
ulong_t sc_ss:16, :16;
ulong_t sc_info; / fault specific info /
ushort_t sc_errc; / error code pushed by processor /
uchar_t sc_fault; / actual fault # /
uchar_t sc_flags; / signal handler flags: */
#define SC_ONSTACK 1
};
enum {
TRAP_ZDIV = 0, /* SIGFPE: divide by zero /
TRAP_DEBUG = 1, / SIGTRAP: debug fault /
TRAP_NMI = 2, / SGIBUS: nmi fault /
TRAP_BRKPT = 3, / SIGTRAP: cpu breakpoint /
TRAP_OFLOW = 4, / SIGFPE: integer overflow*/
TRAP_BOUNDS = 5, /* SIGFPE: bound instn failed*/
TRAP_BADOP = 6, /* SIGILL: invalid opcode /
TRAP_NONDP = 7, / SIGFPE: NDP not present or available /
TRAP_DFAULT = 8, / never: double fault (system error) /
TRAP_NDPSEGV = 9, / SIGSEGV: NDP invalid address /
TRAP_BADTSS = 10, / never: invalid tss (system error) /
TRAP_NOTPRESENT=11, / SIGSEGV: referenced segment not preset /
TRAP_NOSTACK = 12, / SIGSEGV: esp|ebp bad address /
TRAP_GPF = 13, / SIGSEGV: other /
TRAP_PAGE = 14, / SIGSEGV: page fault /
TRAP_FPERROR = 16, / SIGFPE: floating point error */
};
#define __ERRC_VALID (1<<TRAP_DFAULT | 1<<TRAP_BADTSS |
1<<TRAP_NOTPRESENT | 1<<TRAP_NOSTACK | 1<<TRAP_GPF | 1<<TRAP_PAGE )
#define __INFO_VALID (1<<TRAP_PAGE)
#endif
Steve McPolin, QNX Software Systems, Ltd.
point+click: steve@qnx.com
lick+stick: 175 Terence Matthews; Kanata, Ontario, Canada; K2M 1W8
[end sigcontext stuff, from an old post]
Thanks for the help
Hope some of this helps,
-David
Please follow-up to newsgroup, rather than personal email.
David Gibbs
QNX Training Services
dagibbs@qnx.com