Persistent qnx6 filesystem corruption

I’ve been fighting this for weeks now, with no resolution. I am attempting to backup whole 6.5.0 systems using tar (pax won’t recreate hardlinks properly) and no matter how hard I try, the result is a chkqnx6fs corrupted filesystem report and the inability to mount the filesystem in rw mode.

I’ve verified no filenames in the archives are > about 150 bytes, yet at the completion of tar extraction, chkqnx6fs reports many (Sometimes thousands) of invalid filename lengths. Invalid inode counts, etc… etc…

I changed the mount to commit=high, which seemed to clear up the vast majority of the errors. But some (Like a few filename length errors) still persist. chkqnx6fs -f does absolutely nothing to help and there doesn’t seem to be any way to fix issues other than re-format and try again.

What options are there for diagnosing this and getting to a state where I can properly restore an entire system drive? And how can the so-called Power Safe filesystem even get corrupted like this from extracting a tar file in the first place?

My backup command is:

tar -cpzf $backupfile / -X excludes.txt

Where excludes.txt is:

/proc
/mnt/*
/dev
/fs
/net
/tmp/ssh*
/media

Thanks,

XL600

xl600,

We probably need a bit more information to help you diagnose this problem.

For example I see your command (tar -cpzf $backupfile / -X excludes.txt) which creates your archive. But you don’t say how you are extracting/testing this archive.

What command are you using to extract?
Where are you extracting to? (over top of the existing filesystem you created the tar from, or on a brand new base QNX 6.5 install (ie whatever your base install is)
How big is your tar archive and roughly how many files does it contain?
Have you ever tried extracting elsewhere (say Linux) to see if the archive extracts OK there?
Presumably you are creating an extracting as ‘root’?

We use tar here to create software update packages. Our packages are no where near the size yours are because they don’t contain thousands of files. That said, we’ve never had a problem restoring a tar archive. We have a much simpler command since everything resides under one directory (no need for excludes).

tar -rvf tarFileName Directory

I am curious about the -z option (filter through gzip) which I’ve never used. I wonder if that’s causing some kind of issue. Also curious why you are extracting the protecting info (p option)?

Tim

P.S. The power safe filesystem is designed to prevent loss of data on power failure. It’s won’t prevent users or processes from mangling files in some manner.

Tim, of course users can mangle files in all sorts of ways but the QNX 6 file system is supposed to prevent file system corruption. If it can do that during a power failure, it should be able to do that without a power failure happening.

  There is a requirement for the QNX 6 file system to do its job properly.   The underlying hardware must report back correctly when the data has reached the hardware.   This hardware that does not do this.   This does not explain xi600's problem as he is presumably not losing power.

   If I were in this situation I would first try de-archiving onto a QNX 4 file system.   If that works it confirms that the tar file is not in doubt.   If you have a contract with QNX this would put you into a position to open a ticket.    If not, you might try breaking the process up into separate archives for each root directory, and seeing how that works.   It might narrow down where the problem is.

Sorry I didn’t reply earlier (Been working on other things).

Tim,

Some answers to your questions…

I’m using tar -xvpf $backupfile to extract.
Extracting to a mounted freshly formatted QNX6 partition on the recovery hard drive.
The archive contains the entire root file system of the original QNX PC hard drive, ecept for the excluded paths.
The tar archive is typically about 1.5GB compressed. About 12GB non compressed.
I have never tried to extract elsewhere (I only have QNX boxes to work with).
Yes, extracting as root.
The -z option is to minimize space (I’m backing up 20 QNX PCs to a single QNX box using the QNXnet interface). But it this problem also happens if I tar to a file in /tmp of the local drive.
The p option is used to retain the ownership and permissions of all files and paths.

The issue is not mangled files. The issue is the file system itself gets corrupted. There’s no power failures happening and there’s no other software running on the system other than QNX itself.

I did try to break the archive into smaller chunks, but found the same errors occur after extracting enough files. It’s like something in QNX is just getting full and suddenly becoming unable to reliably write files. I haven’t tried rebooting the PC between extracting the smaller tar files.

I also tried disabling snapshots during the extraction process, thinking that might have something to do with it, but ultimately that didn’t help.

QNX 6 originally came with the QNX 4 file system. Later they added the QNX 6 file system. If the QNX 4 file system was deprecated after QNX 6.5 I haven’t heard about it, though it is possible.

The program dinit initialized a QNX 4 file system. Then you mount it and except for its inner workings and maximum file sizes, it looks the same as the QNX 6 file system.

xl600,

7Zip and WinRar both allow extraction of tar archives on Windows machines. You could try and extract an archive on Windows and see if it successfully extracts. Obviously the file permissions won’t work on Windows but you would be able to see if all the files and directories at least extract out of the archive.

The only other thing I can suggest is something you already tried. That is creating a local tar archive in the tmp folder. But in this case I also want you to change directories to the root folder and do a ‘find .’ command which will list all the directories/files in the filesystem (you have to scroll up in the shell output to see it all). You are looking for errors which say ‘file system forms an infinite recursive loop’. Those often occur in the /dev folder (they do in my QNX7 system when I run that command). I wonder if you accidentally have some softlinks that are creating an infinite loop in some manner.

Tim

xl600,

One other thing here. You say you are extracting to a mounted freshly formatted QNX6 partition when this problem occurs. Can you post the commands you use to create this fresh formatted QNX6 partition where you are extracting to (ie your format command, dinit, mount etc). Just want to make sure we understand exactly how you are testing the extraction process when it fails.

The other thing to consider trying is if you have a smaller archive that exhibits the problem you could try extracting to a RAM drive (assuming you have 2 gigs of RAM and can get the problem with an archive that’s <2 gigs when extracted).

Tim