Shutdown[0,0] error???

chaw_pig · March 29, 2009, 3:02am

Hi, I am a student working on a robotic project and was using QNX for an OS for the robot. The robot has a PC104 on it and there is a I/O board for data acquisition purposes. Yesterday, I did something wrong with the circuit connecting to I/O board. But after realizing that, I corrected the mistake.
When I run the robot again, after running the robot for about 1 minute, the monitor connected to the robot hanged and I got the following error message (please refer to the pictures).

Previously, I collected data from robot from some sensor 20 times a second. But, after happening that, when it hangs, it doesn’t collect data at all.
Sometimes, it doesn’t hang completely. The screen gets slow. I mean refreshing rate of the data on the screen. The data is collected once a second instead of 20 times.

Can anyone help me find out what happened, please? Could it be something wrong with the hardware?
Thank you.

Regards,
Win

juanplacco · March 29, 2009, 6:32am

Hi, first question is what version of QNX are you running? With qconfig -a, you should know.

That message shows that the kernel is dying. I have to confess that I saw that kind of things in QNX 4.25 several times, and the most of them, was a hardware or environmental problems (such as high temperature), it could happen in some cases if you don’t have all the patches updated. But that seems to be 6.* version… I’ve never seen this kernel dead… But it can happen with some bad wiring…

Is your robot connected to the serial port, parallel port?..

So: check hardware as first choice, and after that, check QNX’s version and patches.

I could bet that the slowness of the screen is because your acquiring proccess (or the associated driver) is running at a lower priority than the graphic driver, and maybe Photon itself (if you are running Photon). So increases the priority of that process. If the communication with the robot is serial you should increase the priority of serial driver too. It’s seems to be running (and dying?) devc-ser8250… in the screen shot.

Well… Maybe with a bit more information, we can help you a little more…

Regards,
JM

mario · March 29, 2009, 12:19pm

This is a kernel dump, in a nut shell its like a SIGSEGV but performed by the kernel. Users ISR are run by the kernel.

It does look like something got damaged;)

ysinitsky · March 30, 2009, 1:27pm

Hello,
It is worth while reading this fragment of QNX documentation:
qnx.com/developers/docs/6.4. … _dump.html

Regards,
Yuriy

maschoen · March 31, 2009, 9:09pm

If your hardware is ok, the most common reason for this type of dump is a driver problem. If you do something wrong in an interrupt handler, there’s no where else to go.

ysinitsky · April 1, 2009, 2:27pm

My former colleague Brian compiled the following list of common Kernel Crashes:

segfault/sigbus in an interrupt handler
Incorrectly written callouts
Operations that cause a machine check or other types of asynchronous faults in the system. If you’re unlucky the fault gets “noticed” by the CPU when we’re executing in the kernel crash. If a user-land thread was active at the time, a SIGBUS to the task is the usual end result.
If there are hardware issues, high load can elicit crashes
Memory controller configuration issues can cause random crashes
Low memory conditions can be hazardous. These generally would be failed assertions (i.e. cannot allocate) which should not have failed.
Having a JTAG attached to the board (not consistent from board to board/JTAG to JTAG) when jumping from startup into the OS (or any time after).
A privileged thread steps on hardware that procnto cares about (the interrupt controller(s), the timer providing the clock, the memory controller, the MMU, etc.)
A root process does a direct physical mapping (MAP_PHYS) to memory it shouldn’t and overwrites important bits.
A privileged thread programs a DMA engine to scribble over memory it shouldn’t.
Startup lies about something on the system page (e.g. where memory is, how much memory is in the system, what the capabilities of the CPU are, etc).

mario · April 1, 2009, 6:04pm

Some item in the list are related to QNX6 only.

chaw_pig · April 2, 2009, 10:40am

Thanks for your reply.

Here is another error coming up today:

Well, I forgot to mention that the warning screen only appears when we start to run the motor. The motor is attached to a frame and the PC104 boards are mounted onto this frame. When the motor runs, it generate vibration and somehow causes the hanging problem. If the motor doesnot run, we do not get any warning errors, the OS ran for >10mins without any problem.

So I doubt the vibration can cause some components to loose. However I did check the board and found no moving part (the part that is most easy to move is only the RAM). What can be the problem? Is there a proper way to test the board?

And what are the S/C/F=11/4/2 values mean? Does it help identify where the problem should be? The error also includes devc-con process, which is a "Simple console and keyboard I/O manager ". Does this have to do with keyboard I/O (strange because we do not have any keyboard)

Any kind help is really appreciated

rgallen · April 2, 2009, 4:07pm

What is the keyboard controller doing? Is the interrupt being held asserted?

The IRQ is 1, or 12 (12 is the PS/2 mouse).

chaw_pig · April 2, 2009, 6:39pm

Would you please elaborate? Our robot does not have any mouse or keyboard attached

Can faulty RAM cause this problem?
I still feel strange because the error is caused by vibration

rgallen · April 2, 2009, 7:15pm

Whether you have a keyboard or not, there are traces going from the KC to the ps2 connector. If there is a short caused by a vibration, it could easily cause an interrupt storm that might exhaust the interrupt stack. Additionally, getting weird data back from the ports when the interrupt happens may cause devc-con to SEGV in the interrupt handler…