QNX: process vanishes without further traces ...

After working with QNX4 for more than 10 years, I am actually expieriencing a really wierd problem:

A certain process (nothing complicated, just a server that answers some questions, no hardware I/O) simply diappears after a few day running, without leaving any dumps or traceinfos. If this process ist started in a ‘while’ loop of a shell script, the shell disappears as well. The server has some modems attached, where some BMS dial in and is attached to a network with a slinger and phrelay as interface.

We changed the hardware already, so this is probably no hardware related problem. Unfortunately this happen only in a customer’s site and not in the lab :frowning:

The only remarkable thing: the PID, that are uses for the new processes switch from 32xxx
to the low numbers just before this magic crash happens.

Any ideas?

BTW: we’re using QNX4.25, ‘N’-Kernel.

maybe the “parent” of your process died and signalled the child to die. try to run it with “nohup”?

Most likely not: it’s just a shell executing a command like this

while test 1 ; do MyProg; (echo "Restart " $(date)>> /tmp/logfile); done

which works perfectly, when ‘MyProg’ is slayed with any signal.

But I’ll try to trap all other (unexpected) signals, just to be safe :slight_smile:

Does anybody know, if there’re ‘special effects’ when the kernel
reycles the ‘low’ PIDs?

You can use tracectrl to up the level which is reported - to the point of showing all process creation and death. It is a little verbose, but you might be glean something from that. There is nothing special about the recycling pids. The only pids that “may” be special is pids 1-5. I suspect that really on pid 1 is special.

More than likely something killed your shell. You may be able to add shell commands to increase the verbosity there and even go as far as catching signals in the shell.

That’s what I did already, that’s why I found the coincidence with the ‘low’ PIDs.

I quess, that what I am going to do. The ‘dying process’ is almost the only one, which does’nt have any signal handler installed, nor it doesn’t ignore the SIGHUP. So I hope that will
avoid the process termination.

I still wonder, WHO could send the signals?!?

I find lots of ‘Generate hangup ()’ in the tracelog, every time a hangup on a modem
occurs, what’s normal. And it obviously never has a ‘negative effect’ on other processes.

The next ‘jump back’ to low PIDs will happen in a week or so, I hope will have some more diagnostics installed by then …

Is there any chance, to find out who has send the signal to a process?

Thanks for your hints, anyway :slight_smile:

Karsten.

I guess, this problem was solved in the meantime,
it probably was the magic 'Proc-SIGHUP-Problem:

openqnx.com/index.php?name=P … pic&t=4532

Yes, seems this issue is known to QSSL and is already addressed.
Adam Mallory says it’s ID is #5251.

Currently I’m at customer’s site servicing our QNX4 boxes. (I have ~6 hours till departure)

If anyone has got this update - please send it to me!

crashtestdummy@mail.ru

Thank you in advance!

Tony.