I am getting very frustrated, and the developers are getting very angry
with me because our QNX server crashes in one of several different ways,
several times a day.
Most of the time, the server is completely locked up. I have to switch
off the main power in order to reboot it. After reboot, there are no log
messages, no diagnostics, no core dumps, nothing to indicate that
there’s a problem with anything. I have spent countless hours watching
“spin” output, but there is no indication of what’s causing the problem
there, either. spin simply stops updating shortly before the server crashes.
QNX seems to completely lack any meaningful/useful logging facility. It
has syslog, and I was logging EVERYTHING for a time, but what little I
got was completely irrelevant. slogger/sloginfo output is equally
useless; according to QNX support, the messages I’m seeing there are
completely normal.
This is a simple system, a motherboard, a handful of disks, and two
network cards. It’s running Momentics 6.2.0 PE, and runs samba as its
main server application. Clients are other Momentics 6.2.0 PE
workstations, and Windows samba clients. We use QNET for file sharing,
because it’s the only thing QNX has that works at all, and keeping 27GB
of data mirrored among 30+ workstations is an impossible task. For the
most part, users just get their home directories over QNET. Development
is done on the local hard drives of the QNX workstations. There’s no way
to measure QNET traffic, so I can’t tell how hard we’re hitting the server.
I believe it’s load related. The server never crashes overnight or on
weekends, when few, if any, people are here.
How do you diagnose a server with no diagnostics? The hardware is fine,
otherwise, we’d see problems at all hours of the day.