System freezes completely - logfile?

Hi,

I have a problem with a QNX 6.3 that sometimes freezes completely. In this case the system does not respond to anything and it has t ube turned off completely.

So how can I find out what the reason could be?

Is there a possibility to enable a logfile to find out (at least) what happened last to this system?

Kind regards

Elmi

If you think this might be a runaway process, you might want to start a high priority shell. If you have the adaptive partition feature, you can setup a small partition that will aways get some cpu. You aren’t by any chance using a SCSI controller are you? I had a problem a long time ago, with either 6.2 or 6.3.0. It involved a SCSI driver and a multiple processor system. The upgrade to 6.3.2 fixed the problem.

Elmi,

Assuming you have a keyboard attached to your system (ie it’s not purely embedded), when it hangs, do the Caps Locl/Scroll Lock lights still work on the keyboard?

Other things that would help us give you advice is if you told us a bit more about your system (the hardware platform, any hardware installed beyond standard keyboard/mouse/video) and the processes you are running on that system at the time of the lockup.

Tim

The system uses a VIA CPU 1GHz on a PC104 module. Attached via EISA there are several digital IOs and some additional serial ports. The system is running on an Compact Flash. A keyboard can be connected but I did not have one when it freezed the last time.

A SCSI controller is not available within that system, and I’m already using 6.3.2

About that high priority shell: is it a simple pterm where I set the priority to the maximum?

Btw: when the system is frozen also the mouse/the inputs from the touch do not cause any reaction on the mouse pointer, that’s why I assume that the CPU is halted completely.

Elmi,

Yes. Although if you are running photon (which it appears you are since you have a touch screen) you won’t be able to access the high priority pterm since it will be hidden by photon.

So you are saying that because the touch screen stops working you assume the CPU/system is hung? That’s not a good assumption as it could be something as simple as a bad touch screen device. Or maybe the USB (I am assuming the touch screen is connected via USB) has a problem.

You definitely need to get a keyboard attached to verify for sure if the CPU is hung vs just having a touch screen issue of some kind.

Does this system have any network access where you could ping it to determine if it was still alive.

Tim

Not really. The software also controls a process which stops completely at this point. Beside of that some functions can be invoked by setting some of the external inputs - and that also doesn’t works. So there is absolutely no reaction from the wole system - including the touch.

That’s something I tried before, the network is dead too. Because of that I’m looking for a possibility to get som log information to find out what happened last on the system before it died.

Elmi,

Ah, OK now things are a bit clearer.

There is a system logger. It’s a process called slogger that you can start. The thing is, it may not really help much if there is a runaway process that hogs 100% of the CPU because even the logger won’t get time to run. Plus it won’t be able to log things like a ISR crashing the system. Basically, it just logs things that processes send to it. It’s not a monitor of the system health.

The reality is you are going to have to find a way to replicate the freezing of the system and then try and determine which process is causing it. The high priority console will help determine if you have a runaway process using 100% of the CPU but it won’t help in the case of an ISR crash.

Tim

The easiest way to set up the high priority shell is to completely ignore the direct graphical interface and use either a serial port connection or a network connection.

If you can, use the serial port since in that case you can boost the priority of the serial port driver (devc-8250 I’m guessing, otherwise devc-*) up before you get into your lock up situation.

slay -P 200 devc-ser8250

If you can’t use the serial port, then I find boosting the network interface driver (io-net/io-pkt) and inetd to be the next easiest.

slay -P 200 io-net

slay -P 200 inetd

Now open a serial port or telnet in and check the priority of your connection. Assuming all went well when you do a pidin you should see your shell listed with a high priority.

Now run your stuff … if you find that things lock up and your shell is still hosed, then your issue is more serious than a runaway process running READY. If you system is locked up from the UI, but the shell is still responsive, then pidin (in the high priority shell) will give you and indication of what is going on … post the results here if you need help interpreting them.

If things are locked up and you don’t have a shell, then you might give the instrumented kernel tracing a go assuming you have some location where you can persistantly store the data. Take a look at the options for tracelogger and then start tracelogger as a high priority task. Assuming you manage to get a log file, you can use the System Profiler in Momentics to look at the results.

Thomas