I fully agree, though UPS and polite shutdown is not always
possible/practical.
You also need to be aware that there is a problem floating somewhere
in Fsys (4.24 and 4.25) that can arbitrarily trash disks. John Garvey
and I were hunting it down over 2 years ago but it still exists (because
I see it more frequently) and there are others that have posted it but
not gotten any response.
I’ve seen it manifested in multiple ways - all relating to .bitmap
allocation:
- files being extended overwrite files being loaded/executed
- files being extended overwrite bad things - .boot, .inodes, root
directory
- it hops partitions and drives - writing to hd1t80 I get data from one of
these files in something on hd0t78… This spans SCSI and IDE drivers!
Why now more than before - longevity (I believe) - files are getting larger
and fragmented with some approaching the 2gig limit.
- One situation is running out of disk space - keep opening and trying to
write (we were doing ~40 at a time) and if you keep trying at some
point it will step on something… We fixed the “feature” that was doing
this
be I still don’t think Fsys should have done what it did.
- 2nd is exceeding the 2gig limit on a sequential write and not stopping.
We were backing up databases from an NT system and it exceeded
2gig without anybody noticing… we had a couple of lost links/misallocated
blocks after each backup - give it time and it self-destructs… We also
fixed this in our code but it still shouldn’t happen.
This is the only ongoing complaint our support group has today
regarding QNX…
David - I am re-running this through our sales channel to get it (hopefully)
fixed once and for all… I need it on 4.24 and 4.25.
Jay
Dean Douthat wrote in message <3B6ABE4F.5365D919@faac.com>…
In an application at FAAC, we run our executive program above Proc and have
no
problems. In fact, it is necessary to do so to avoid some latency problems
associated with starting/stopping processes.
In another application at another company where we are writing financial
data to
disk and cannot afford to lose/corrupt it, we use a fairly long
write-behind time
but have the entire system on UPS with interrupt on power line loss. When
batteries are running low, we shut down the application, flush disk cache
and then
shutdown QNX. A monitored UPS is the only surefire way, that I know of, to
avoid
data loss/corruption.
David Gibbs wrote:
Ron Cococcia <> cococr@cs.rpi.edu> > wrote:
I was just trying to relate the beginnings of corruption in our product
with
some changes to system settings. I hadn’t received any major
corruptions
until 1 customer couldn’t boot his unit (Lost Link).
That sounds like unexpected power-off type corruption.
The product we make can be interrupted by power (unfortunately) and my
guess
is that’s the reason the corruptions are happening. But from the
internal
point of view, if something I was doing could possibly prevent a file
system
update from happening when it needed to be, I wanted to know.
In general, except at power-off time, no particular filesystem update is
time critical. Fsys maintains a consistent in-memory state.
Complete starvation is, of course, bad – but that doesn’t sound like
what
is happening.
But, a partially updated on-disk state when a sudden power-off hits can
and will cause corruption. Of course, if you do delay disk updates, you
can slightly increase the likelihood that your filesystem will be
corrupted
in case of a sudden power-off. But, since the high priority thing is
generally pre-empting both Fsys & the client requesting the write/change,
I’m not sure this will have any detectable affect.
Make sure you are NOT running Fsys with the -a option. It can greatly
increase the likelihood of corruption in case of unexpected power loss.
-David
QNX Training Services
dagibbs@qnx.com