JS <jsukamtoh@infolink.co.id> wrote:
Our application is running 24-hour non-stop everyday on QNX 6.21 using HP
Proliant ML370 G3 Xeon single processor. We have a problem where the system
hang at least once a month. At first we suspected that one of the process
might have run away and consumed all the CPU time, so we set one of our
console to the highest priority (63) so that we can use that console to
analyze the system. However, when it happened again today, our console also
got frozen, and the keyboard also became hang because the Num Lock key is
not operational. So we guessed the O/S had somehow frozen.
While the shell on the console might be highest priority (63), if the
keyboard driver that handles your keyboard input is not at that priority,
it may still be pre-empted and your typing won’t get anywhere.
And, if by console you mean a Photon terminal (pterm) window, then there’s
even more layers that are pre-emptable.
Assuming you’re in text-mode, not graphics mode, the fact that there is
no kernel dump displayed does suggest the O.S. hasn’t crashed.
The most likely causes:
– high priority run away thread (yes, you are looking for this, but as I
noted, I’m not sure you’ve got a valid “look-see” for it)
– run away ISR or IRQ (as someone else noted)
– run-away pulse queue, where you have a pulse receiver that is never
dequeueing pulses generated by either a timer or ISR
This problem is not reproducable and intermitent. It may take a day or a
week to have the same problem occur again. There is no core dump at all. We
really appreciate any suggestion on what we should do to identify the cause
of the problem. Is there any tools that we can use to log the system
condition? Many many thanks to your recommendation.
Unfortunately, there aren’t a lot of useful tools for logging system
condition that don’t require a moderately sane system. They need CPU
time and the ability to be scheduled to do their work. Depending on
how nasty it gets, using the instrumented kernel, hooking into the
tracing mechanism pseudo-ISR, and dumping stuff to a serial port that
goes elsewhere (and not using the serial port driver, but hitting the
hardware directly, probably through startup’s debug callouts) or directly
to VGA memory (if text mode) are possibilities. They are, though, pretty
nasty.
-David
David Gibbs
QNX Training Services
dagibbs@qnx.com