Robustness of QNX File System

I am using QNX 6.1A with QNX4 FS.

Am I the only one which has concerns regarding the QNX FS?

Let me explain a couple of issues that I am having:

  1. If a process gets killed or dies some how and has files open - the files
    open by the process are very likely to end up as corrupted. If those files
    are like the Samba log files where the same file name is used - Samba cannot
    open them until a chkfsys is run.

  2. At times when I perform a shutdown some of you local .history and other
    user files seems to end up corrupt and I have to run a chkfsys again before
    everything is restored.

I understand if I run chkfsys that the file system seem to recover to some
degree, but it is a pain having to do that manually and it is even a bigger
pain to make QNX automatically run chkfsys at start up because you need to
change the boot image and even worse you have to split the startup of
devb-eide and diskboot’s other actions.

Are there any alternatives to QNX4 FS or any improvements coming up in the
above described areas?

PS: ext2fs doesn’t seem to be an option because QNX cannot repair an ext2fs
partion.

Thanks
Jens

I’ve been using QNX 6 as my primary development host for a year and a half
and I’ve never seen any of this filesystem corruption. Of course, I haven’t
been looking either.

Kris

“Jens H Jorgensen” <jhj@remove-nospam-videk.com> wrote in message
news:a55lq7$iva$1@inn.qnx.com

I am using QNX 6.1A with QNX4 FS.

Am I the only one which has concerns regarding the QNX FS?

Let me explain a couple of issues that I am having:

  1. If a process gets killed or dies some how and has files open - the
    files
    open by the process are very likely to end up as corrupted. If those files
    are like the Samba log files where the same file name is used - Samba
    cannot
    open them until a chkfsys is run.

  2. At times when I perform a shutdown some of you local .history and other
    user files seems to end up corrupt and I have to run a chkfsys again
    before
    everything is restored.

I understand if I run chkfsys that the file system seem to recover to some
degree, but it is a pain having to do that manually and it is even a
bigger
pain to make QNX automatically run chkfsys at start up because you need to
change the boot image and even worse you have to split the startup of
devb-eide and diskboot’s other actions.

Are there any alternatives to QNX4 FS or any improvements coming up in the
above described areas?

PS: ext2fs doesn’t seem to be an option because QNX cannot repair an
ext2fs
partion.

Thanks
Jens

I’ve had continual problems with syslog. Can’t run for very long before
/var/log/syslog is corrupt.

“Jens H Jorgensen” <jhj@remove-nospam-videk.com> wrote in message
news:a55lq7$iva$1@inn.qnx.com

I am using QNX 6.1A with QNX4 FS.

Am I the only one which has concerns regarding the QNX FS?

Let me explain a couple of issues that I am having:

  1. If a process gets killed or dies some how and has files open - the
    files
    open by the process are very likely to end up as corrupted. If those files
    are like the Samba log files where the same file name is used - Samba
    cannot
    open them until a chkfsys is run.

  2. At times when I perform a shutdown some of you local .history and other
    user files seems to end up corrupt and I have to run a chkfsys again
    before
    everything is restored.

I understand if I run chkfsys that the file system seem to recover to some
degree, but it is a pain having to do that manually and it is even a
bigger
pain to make QNX automatically run chkfsys at start up because you need to
change the boot image and even worse you have to split the startup of
devb-eide and diskboot’s other actions.

Are there any alternatives to QNX4 FS or any improvements coming up in the
above described areas?

PS: ext2fs doesn’t seem to be an option because QNX cannot repair an
ext2fs
partion.

Thanks
Jens

Well it might be because I generate too many segv’s in my code and thereby
end up with corrupt files :wink:

Joke a side - we ported our software from UW 1.1 with RTX (ten years old OS)
and the old Unixware file system seemed more robust than QNX4FS.

Are you running chkfsys at startup ?

Do you see any files being fixed if you do a:

chkfsys -u /

?


Jens

“Kris Warkentin” <kewarken@qnx.com> wrote in message
news:a55t2s$16e$1@nntp.qnx.com

I’ve been using QNX 6 as my primary development host for a year and a half
and I’ve never seen any of this filesystem corruption. Of course, I
haven’t
been looking either.

Kris

“Jens H Jorgensen” <> jhj@remove-nospam-videk.com> > wrote in message
news:a55lq7$iva$> 1@inn.qnx.com> …
I am using QNX 6.1A with QNX4 FS.

Am I the only one which has concerns regarding the QNX FS?

Let me explain a couple of issues that I am having:

  1. If a process gets killed or dies some how and has files open - the
    files
    open by the process are very likely to end up as corrupted. If those
    files
    are like the Samba log files where the same file name is used - Samba
    cannot
    open them until a chkfsys is run.

  2. At times when I perform a shutdown some of you local .history and
    other
    user files seems to end up corrupt and I have to run a chkfsys again
    before
    everything is restored.

I understand if I run chkfsys that the file system seem to recover to
some
degree, but it is a pain having to do that manually and it is even a
bigger
pain to make QNX automatically run chkfsys at start up because you need
to
change the boot image and even worse you have to split the startup of
devb-eide and diskboot’s other actions.

Are there any alternatives to QNX4 FS or any improvements coming up in
the
above described areas?

PS: ext2fs doesn’t seem to be an option because QNX cannot repair an
ext2fs
partion.

Thanks
Jens
\

“Jens H Jorgensen” <jhj@remove-nospam-videk.com> wrote in message
news:a55vug$ps0$1@inn.qnx.com

Well it might be because I generate too many segv’s in my code and thereby
end up with corrupt files > :wink:

Joke a side - we ported our software from UW 1.1 with RTX (ten years old
OS)
and the old Unixware file system seemed more robust than QNX4FS.

Are you running chkfsys at startup ?

No…but now I’m afraid to…:wink:

Do you see any files being fixed if you do a:

chkfsys -u /

Just ran it and saw no problems. I am, however, running a fairly fresh 6.2
install though so if things are getting buggered, they probably haven’t had
a chance yet. I’ll start running chkfsys a little more often maybe to see
if I have problems.

cheers,

Kris

?


Jens

“Kris Warkentin” <> kewarken@qnx.com> > wrote in message
news:a55t2s$16e$> 1@nntp.qnx.com> …
I’ve been using QNX 6 as my primary development host for a year and a
half
and I’ve never seen any of this filesystem corruption. Of course, I
haven’t
been looking either.

Kris

“Jens H Jorgensen” <> jhj@remove-nospam-videk.com> > wrote in message
news:a55lq7$iva$> 1@inn.qnx.com> …
I am using QNX 6.1A with QNX4 FS.

Am I the only one which has concerns regarding the QNX FS?

Let me explain a couple of issues that I am having:

  1. If a process gets killed or dies some how and has files open - the
    files
    open by the process are very likely to end up as corrupted. If those
    files
    are like the Samba log files where the same file name is used - Samba
    cannot
    open them until a chkfsys is run.

  2. At times when I perform a shutdown some of you local .history and
    other
    user files seems to end up corrupt and I have to run a chkfsys again
    before
    everything is restored.

I understand if I run chkfsys that the file system seem to recover to
some
degree, but it is a pain having to do that manually and it is even a
bigger
pain to make QNX automatically run chkfsys at start up because you
need
to
change the boot image and even worse you have to split the startup of
devb-eide and diskboot’s other actions.

Are there any alternatives to QNX4 FS or any improvements coming up in
the
above described areas?

PS: ext2fs doesn’t seem to be an option because QNX cannot repair an
ext2fs
partion.

Thanks
Jens


\

Jens H Jorgensen <jhj@remove-nospam-videk.com> wrote:

  1. If a process gets killed or dies some how and has files open - the
    files open by the process are very likely to end up as corrupted.

No, this should not happen. A process dying will cause Proc to deliver
a close() on all its fds; the filesystem cannot distinguish these cases
from normal operation.

If those files are like the Samba log files where the same file name
is used - Samba cannot open them until a chkfsys is run.

The QNX4 filesystem has the concept of a “busy” file, one that has been
opened for writing, grown, but not yet closed (it sounds as if syslogd
generates such files). In such a case the file is marked “busy” to
indicate a potential inconsistency, not that it is corrupt, but that
if power were to be lost at that moment, the data may not yet be valid.
Closing the file, terminating the filesystem, or a organised shutdown
will remove this state. Only if none of these operations is done, then
the file will remain “busy” and generate an EBADFSYS error on a later
open() attempt. Currently, a “chkfsys” is required to knock-down this
stuck “busy” attribute (there is likely no real “corruption”).

Are there any alternatives to QNX4 FS or any improvements coming up
in the above described areas?

Some work has been done in this area. The internal filesystem manipulation
of this “busy” attribute is slightly modified. Plus, recognising that it
is not always possible/desirable to run a “chkfsys”, the filesystem now
has a “qnx4 unbusy” option, which (rather than the default of failing with
an EBADFSYS) will truncate the file back to its last known good size and
continue operation / allow it to be open()d. These features are not
available until after 6.2 though.

However, all of this is only an issue if there is unexpected power loss
(e.g. and unclean shutdown). If the system is shutdown properly, or
the filesystem terminated, or the process with open/growing files dies
or otherwise closes the file, then the ondisk data structures are made
consistent and the busy/EBADFSYS is undone. If you are persistently
seeing such problems then you are not performing a clean system shutdown,
or are using phshutdown with a very very old RTP where devb-* was not
protected against the initial SIGPWR …

“John Garvey” <jgarvey@qnx.com> wrote in message
news:a5bh7l$2va$1@nntp.qnx.com

Jens H Jorgensen <> jhj@remove-nospam-videk.com> > wrote:

  1. If a process gets killed or dies some how and has files open - the
    files open by the process are very likely to end up as corrupted.

No, this should not happen. A process dying will cause Proc to deliver
a close() on all its fds; the filesystem cannot distinguish these cases
from normal operation.

I might be because we crash the system pretty bad.
Our application performs a bunch of low-level stuff which sometimes
can cause the system to crash pretty bad.

If those files are like the Samba log files where the same file name
is used - Samba cannot open them until a chkfsys is run.

The QNX4 filesystem has the concept of a “busy” file, one that has been
opened for writing, grown, but not yet closed (it sounds as if syslogd
generates such files). In such a case the file is marked “busy” to
indicate a potential inconsistency, not that it is corrupt, but that
if power were to be lost at that moment, the data may not yet be valid.
Closing the file, terminating the filesystem, or a organised shutdown
will remove this state. Only if none of these operations is done, then
the file will remain “busy” and generate an EBADFSYS error on a later
open() attempt. Currently, a “chkfsys” is required to knock-down this
stuck “busy” attribute (there is likely no real “corruption”).

Aha - things are starting to make more sense. It still a little strange that
the Samba log files seems to always end up “busy”. I wonder if it is due
to the senquence in which my system gets shutdown.

Are there any alternatives to QNX4 FS or any improvements coming up
in the above described areas?

Some work has been done in this area. The internal filesystem
manipulation
of this “busy” attribute is slightly modified. Plus, recognising that it
is not always possible/desirable to run a “chkfsys”, the filesystem now
has a “qnx4 unbusy” option, which (rather than the default of failing with
an EBADFSYS) will truncate the file back to its last known good size and
continue operation / allow it to be open()d. These features are not
available until after 6.2 though.

Can’t wait for that capability - now when is it 6.2 is expected?

However, all of this is only an issue if there is unexpected power loss
(e.g. and unclean shutdown). If the system is shutdown properly, or
the filesystem terminated, or the process with open/growing files dies
or otherwise closes the file, then the ondisk data structures are made
consistent and the busy/EBADFSYS is undone. If you are persistently
seeing such problems then you are not performing a clean system shutdown,
or are using phshutdown with a very very old RTP where devb-* was not
protected against the initial SIGPWR …

Thanks for the clarifications,
Jens

John Garvey wrote:

… A process dying will cause Proc to deliver
a close() on all its fds; the filesystem cannot distinguish these cases
from normal operation.

The QNX4 filesystem has the concept of a “busy” file, one that has been
opened for writing, grown, but not yet closed (it sounds as if syslogd
generates such files). In such a case the file is marked “busy” to
indicate a potential inconsistency, not that it is corrupt, but that
if power were to be lost at that moment, the data may not yet be valid.
Closing the file, terminating the filesystem, or a organised shutdown
will remove this state. Only if none of these operations is done, then
the file will remain “busy” and generate an EBADFSYS error on a later
open() attempt. Currently, a “chkfsys” is required to knock-down this
stuck “busy” attribute (there is likely no real “corruption”).

…Maybe it’s not the right NG (my problems are on QNX 4.25), but I get
something similar with a DHCP server that I have compiled: I start it
in the /etc/netstart script (at system boot), and I ALWAYS stop my QNX
box
with a clean “shutdown -fb”… but sometimes (say once every ten or
fifteen) the files /etc/dhcp.leases and /etc/~dhcp.leases are left
“corrupted”, and chkfsys is required on the next startup.

Any idea?

/------------------------------------------------------------

  • Davide Ancri - Prisma Engineering
  • email = davidea at prisma dash eng dot it
    ------------------------------------------------------------/

DaviGnu <no.more.spam@nowhere.org> wrote:

…Maybe it’s not the right NG (my problems are on QNX 4.25)

No, try “qdn.public.qnx4”, but I’m the same person so …

in the /etc/netstart script (at system boot), and I ALWAYS stop my QNX
box with a clean “shutdown -fb”… but sometimes (say once every ten or
fifteen) the files /etc/dhcp.leases and /etc/~dhcp.leases are left
“corrupted”, and chkfsys is required on the next startup.

If there are a lot of dirty files to tidy up, it can be possible for
the “fast” shutdown to reboot whilst Fsys is still trying to write
data out to disk! At some point in the 4.25 lifecycle the interaction
between Fsys and shutdown was improved so that this would not happen.
You may be able to test this by not using the “-f” flag for a while and
see if this removes the problem … if so, then look at upgrading to a
newer version of QNX 4.25 …

I believe that the qnx4 file system is very robust, BUT, . . .

I have experienced much corruption. It is always when some photon utility
write out a config file and then the system is rebooted before that blocks
are fushed to disk.

NOTE: I can make changes to helpviewer or pfm or other config files and
leave the systenm running for hours or days and the data never gets flushed.
I have found that I had to exit photon to text mode, then wait a minute or
issue a sync. Then I can safely shutdown.

This problem is very real, and has persisted across several of the last few
releases and has been demonstraited on systems despite QSSL’s denial of the
problem.


Bill Caroselli – 1(626) 824-7983
Q-TPS Consulting
QTPS@EarthLink.net

Bill Caroselli <qtps@earthlink.net> wrote:

I believe that the qnx4 file system is very robust, BUT, . . .

I have experienced much corruption. It is always when some photon utility
write out a config file and then the system is rebooted before that blocks
are fushed to disk.

On QNX4.25 we had to link /tmp/syslog and several other files to /dev/null
to protect corruptions. Also files used e.g. by samba we are trying to
delete by rm and then with zap utility. UPS and correct shutdowns helps,
but potentially it is dangerous.

Andy

qtps@earthlink.net sed in <a5hf7i$1mg$1@inn.qnx.com>:

I have experienced much corruption. It is always when some photon utility
write out a config file and then the system is rebooted before that blocks
are fushed to disk.

This problem is very real, and has persisted across several of the last few
releases and has been demonstraited on systems despite QSSL’s denial of the
problem.

During cranking up my custom “shutdown” program,
(so that I could issue ATX power-off on -S system)

I’ve realized that inserting sync() before killing devb-* will
greatly improve the problem.
Without sync(), ntpd routinely leaved logfiles corrupt.

I guess stock /bin/shutdown doesn’t sync().

kabe

kabe@sra-tohoku.co.jp wrote:

qtps@earthlink.net > sed in <a5hf7i$1mg$> 1@inn.qnx.com> >:
This problem is very real, and has persisted across several of the last
few releases and has been demonstraited on systems despite QSSL’s denial
of the problem.

Well, quite frankly, it is not a filesystem problem; if you unmount the
filesystems or slay the devb-* driver (which automatically internally
unmounts all filesystems), then everything will be sync’d up to disk. So,
provided that shutdown terminates all processes or at least the disk
drivers (which it does), then there should not be a problem. Try a
“slay devb-eide” and you should never see any “corruption”.

The best theory I’ve had is that some older EIDE disks do not support the
“flush cache (E7)” command, and so written data remains in the on-disk
track-cache. “shutdown” is somewhat aggressive about reseting the CPU,
and so this data, considered physically written, is actually lost. More
recent devb-eide drivers will catch any disk not implementing this flush
command and insert a 1/2-second wait before returning control to unmount;
I am hoping this will address this issue (from 6.2.1 onwards).

I’ve realized that inserting sync() before killing devb-* will
greatly improve the problem. Without sync(), ntpd routinely leaved
logfiles corrupt. I guess stock /bin/shutdown doesn’t sync().

No, it doesn’t, but unmounting all filesystems or slaying the disk driver
is a much stronger guarantee. The semantics for sync() offer no commitment
about when any actual writing to disk will complete. And these files are
probably not “corrupt”, merely “potentially corrupt (busy)”; so the 6.2.1
option “qnx4 unbusy” will help those unable to power-down properly (by just
truncating such files to their last known size and continuing).

jgarvey@qnx.com sed in <at96f4$3mq$1@nntp.qnx.com>:

I’ve realized that inserting sync() before killing devb-* will
greatly improve the problem. Without sync(), ntpd routinely leaved
logfiles corrupt. I guess stock /bin/shutdown doesn’t sync().

No, it doesn’t, but unmounting all filesystems or slaying the disk driver
is a much stronger guarantee. The semantics for sync() offer no commitment
about when any actual writing to disk will complete. And these files are
probably not “corrupt”, merely “potentially corrupt (busy)”; so the 6.2.1

yes, it’s not corrupt only untruncated, but ntpd coughs up badly
on next boot so there’s no big difference; you need chkfsys.

I know sync() isn’t guranteeing anything (it does block for flush, tho)
but then I can’t explain why sync() cures things, unless devb-*
is forgetting something ought to be done.

kabe