chkfsys link lost error

Hi,

I have an ongoing problem with a SanDisk plugged into a PC104
board running QNX 4.25. This system does not have the capability
of performing a graceful shutdown, so it loses power while there may
be files open and being written to by our software. We execute
“chkfsys -Pqrsu” in sysinit so the filesystem can be repaired at
each boot.

However, lately we have been experiencing some sort of corruption with
lost links. chkfsys reports the “link lost” error with Paused (Cannot
fix)…

This system does not have a keyboard, so we must powerdown and swap out
the disk. I have to use fdisk and dinit to restore the entire disk at
this point. I’ve noticed it always is one of our “log” directories that
it complains about. These are directories where we have tasks that
continually open, write, and close a log file. Other things I’ve noted:

  1. You can do a “ls -al” command in the directory however, if you try to
    list just a file or group of files, ls complains that there is no such
    file
    or directory (and then lists the file name in parenthesis).

  2. The ls command does NOT show the . and … directory entries.

My guess so far is that the actual directory entry table was being
updated at
the time of power loss and was corrupted. Is there any way around
this? Is
there any way to configure Fsys differently to help prevent this? Our
Fsys command within our .boot file is:

Fsys -l 0 -c 100 -d 1 -r 32768 -v 0

Should our cache be set to 0? or something smaller? Should our delay be
set
to 0? Any ideas?

One other thing I don’t understand. Our tasks that do logging create
the file
at startup and then perform a open, write, close each time it updates a
file.
I don’t understand why the directory entry gets corrupted. Seems that
this table
should only be opened when a new file is created or if the directory
extent needs
to grow (which I don’t know, maybe that is what is happening).

Thanks in advance.
Rob Davidson

“Rob Davidson” <rdavidson@SoftwareRemodeling.com> wrote in message
news:3B2E5B7A.271FCA73@SoftwareRemodeling.com

Hi,

I have an ongoing problem with a SanDisk plugged into a PC104
board running QNX 4.25. This system does not have the capability
of performing a graceful shutdown, so it loses power while there may
be files open and being written to by our software. We execute
“chkfsys -Pqrsu” in sysinit so the filesystem can be repaired at
each boot.

However, lately we have been experiencing some sort of corruption with
lost links. chkfsys reports the “link lost” error with Paused (Cannot
fix)…

This system does not have a keyboard, so we must powerdown and swap out
the disk. I have to use fdisk and dinit to restore the entire disk at
this point. I’ve noticed it always is one of our “log” directories that
it complains about. These are directories where we have tasks that
continually open, write, and close a log file. Other things I’ve noted:

  1. You can do a “ls -al” command in the directory however, if you try to
    list just a file or group of files, ls complains that there is no such
    file
    or directory (and then lists the file name in parenthesis).

  2. The ls command does NOT show the . and … directory entries.

My guess so far is that the actual directory entry table was being
updated at
the time of power loss and was corrupted. Is there any way around
this? Is
there any way to configure Fsys differently to help prevent this? Our
Fsys command within our .boot file is:

Fsys -l 0 -c 100 -d 1 -r 32768 -v 0

Should our cache be set to 0? or something smaller? Should our delay be
set to 0?

Yes definitly. -d is the most important

< Any ideas?

Open the files with D_OSYNC.

One other thing I don’t understand. Our tasks that do logging create
the file
at startup and then perform a open, write, close each time it updates a
file.
I don’t understand why the directory entry gets corrupted. Seems that
this table
should only be opened when a new file is created or if the directory
extent needs
to grow (which I don’t know, maybe that is what is happening).

Thanks in advance.
Rob Davidson

Welcome to my world! :slight_smile:

I’ve had many different types of file system corruptions (wouldn’t doubt it
if I’ve had them all). We have a similar problem where we cannot gracefully
shut down. Having small/zero cache sizes and small/zero delays helps a lot.
Lost Links are fairly annoying though, as you’re pretty much in a hole for
whatever data is now corrupted.

Some practices I like to use. For file access, use the lowest level
routines you can (like open() if you can, then fopen()) avoid iostreams if
you can (I found that the level of corruptions I would get were MUCH worse
if I had iostreams doing file writing, like an ofstream). Also, if you can
(re)size the file before you write, that helps prevent corruption as the
extents shouldn’t have to be changed if the file won’t resize again.
Lastly, an fsync is good to help flush the data out to disk (unless your
delay and cache is 0, then I don’t believe an fsync would do anything extra,
but I’m not sure as I’ve never tried it).

-Ron


“Rob Davidson” <rdavidson@SoftwareRemodeling.com> wrote in message
news:3B2E5B7A.271FCA73@SoftwareRemodeling.com

Hi,

I have an ongoing problem with a SanDisk plugged into a PC104
board running QNX 4.25. This system does not have the capability
of performing a graceful shutdown, so it loses power while there may
be files open and being written to by our software. We execute
“chkfsys -Pqrsu” in sysinit so the filesystem can be repaired at
each boot.

However, lately we have been experiencing some sort of corruption with
lost links. chkfsys reports the “link lost” error with Paused (Cannot
fix)…

This system does not have a keyboard, so we must powerdown and swap out
the disk. I have to use fdisk and dinit to restore the entire disk at
this point. I’ve noticed it always is one of our “log” directories that
it complains about. These are directories where we have tasks that
continually open, write, and close a log file. Other things I’ve noted:

  1. You can do a “ls -al” command in the directory however, if you try to
    list just a file or group of files, ls complains that there is no such
    file
    or directory (and then lists the file name in parenthesis).

  2. The ls command does NOT show the . and … directory entries.

My guess so far is that the actual directory entry table was being
updated at
the time of power loss and was corrupted. Is there any way around
this? Is
there any way to configure Fsys differently to help prevent this? Our
Fsys command within our .boot file is:

Fsys -l 0 -c 100 -d 1 -r 32768 -v 0

Should our cache be set to 0? or something smaller? Should our delay be
set
to 0? Any ideas?

One other thing I don’t understand. Our tasks that do logging create
the file
at startup and then perform a open, write, close each time it updates a
file.
I don’t understand why the directory entry gets corrupted. Seems that
this table
should only be opened when a new file is created or if the directory
extent needs
to grow (which I don’t know, maybe that is what is happening).

Thanks in advance.
Rob Davidson

Mario and Ron,

Thanks for the quick response. Here is another related question.
I have a file that is re-written in its entirety every 5 seconds. It
is basically an array of unsigned integers (32 bits). I do an
fopen and then an fwrite( &data, 1, sizeof(data), fptr) and then an
fclose each time. Does the directory table get updated every time or
since the file already exists, does Fsys just open and update the file?

We’ve tried both “wb” and “wb+” flags, but I don’t know if it matters.
I thought that the “+” only meant open it for writing and reading, so we
dropped it since we only want to write. However, if the “+” doesn’t
cause
a directory entry write, that would be better.

Thanks,
Rob


Ron Cococcia wrote:

Welcome to my world! > :slight_smile:

I’ve had many different types of file system corruptions (wouldn’t doubt it
if I’ve had them all). We have a similar problem where we cannot gracefully
shut down. Having small/zero cache sizes and small/zero delays helps a lot.
Lost Links are fairly annoying though, as you’re pretty much in a hole for
whatever data is now corrupted.

Some practices I like to use. For file access, use the lowest level
routines you can (like open() if you can, then fopen()) avoid iostreams if
you can (I found that the level of corruptions I would get were MUCH worse
if I had iostreams doing file writing, like an ofstream). Also, if you can
(re)size the file before you write, that helps prevent corruption as the
extents shouldn’t have to be changed if the file won’t resize again.
Lastly, an fsync is good to help flush the data out to disk (unless your
delay and cache is 0, then I don’t believe an fsync would do anything extra,
but I’m not sure as I’ve never tried it).

-Ron

“Rob Davidson” <> rdavidson@SoftwareRemodeling.com> > wrote in message
news:> 3B2E5B7A.271FCA73@SoftwareRemodeling.com> …
Hi,

I have an ongoing problem with a SanDisk plugged into a PC104
board running QNX 4.25. This system does not have the capability
of performing a graceful shutdown, so it loses power while there may
be files open and being written to by our software. We execute
“chkfsys -Pqrsu” in sysinit so the filesystem can be repaired at
each boot.

However, lately we have been experiencing some sort of corruption with
lost links. chkfsys reports the “link lost” error with Paused (Cannot
fix)…

This system does not have a keyboard, so we must powerdown and swap out
the disk. I have to use fdisk and dinit to restore the entire disk at
this point. I’ve noticed it always is one of our “log” directories that
it complains about. These are directories where we have tasks that
continually open, write, and close a log file. Other things I’ve noted:

  1. You can do a “ls -al” command in the directory however, if you try to
    list just a file or group of files, ls complains that there is no such
    file
    or directory (and then lists the file name in parenthesis).

  2. The ls command does NOT show the . and … directory entries.

My guess so far is that the actual directory entry table was being
updated at
the time of power loss and was corrupted. Is there any way around
this? Is
there any way to configure Fsys differently to help prevent this? Our
Fsys command within our .boot file is:

Fsys -l 0 -c 100 -d 1 -r 32768 -v 0

Should our cache be set to 0? or something smaller? Should our delay be
set
to 0? Any ideas?

One other thing I don’t understand. Our tasks that do logging create
the file
at startup and then perform a open, write, close each time it updates a
file.
I don’t understand why the directory entry gets corrupted. Seems that
this table
should only be opened when a new file is created or if the directory
extent needs
to grow (which I don’t know, maybe that is what is happening).

Thanks in advance.
Rob Davidson

“Rob Davidson” <rdavidson@SoftwareRemodeling.com> wrote in message
news:3B2E6925.7F3840DA@SoftwareRemodeling.com

Mario and Ron,

Thanks for the quick response. Here is another related question.
I have a file that is re-written in its entirety every 5 seconds. It
is basically an array of unsigned integers (32 bits). I do an
fopen and then an fwrite( &data, 1, sizeof(data), fptr) and then an
fclose each time.

For speed and safety use open/write/close.

Does the directory table get updated every time or
since the file already exists, does Fsys just open and update the file?


We’ve tried both “wb” and “wb+” flags, but I don’t know if it matters.
I thought that the “+” only meant open it for writing and reading, so we
dropped it since we only want to write. However, if the “+” doesn’t
cause a directory entry write, that would be better.

I’m not too familiary with that as I use open. Note that b (binary)
doesn’t mean anything with QNX, everything is binary.

Thanks,
Rob


Ron Cococcia wrote:

Welcome to my world! > :slight_smile:

I’ve had many different types of file system corruptions (wouldn’t doubt
it
if I’ve had them all). We have a similar problem where we cannot
gracefully
shut down. Having small/zero cache sizes and small/zero delays helps a
lot.
Lost Links are fairly annoying though, as you’re pretty much in a hole
for
whatever data is now corrupted.

Some practices I like to use. For file access, use the lowest level
routines you can (like open() if you can, then fopen()) avoid iostreams
if
you can (I found that the level of corruptions I would get were MUCH
worse
if I had iostreams doing file writing, like an ofstream). Also, if you
can
(re)size the file before you write, that helps prevent corruption as the
extents shouldn’t have to be changed if the file won’t resize again.
Lastly, an fsync is good to help flush the data out to disk (unless your
delay and cache is 0, then I don’t believe an fsync would do anything
extra,
but I’m not sure as I’ve never tried it).

-Ron

“Rob Davidson” <> rdavidson@SoftwareRemodeling.com> > wrote in message
news:> 3B2E5B7A.271FCA73@SoftwareRemodeling.com> …
Hi,

I have an ongoing problem with a SanDisk plugged into a PC104
board running QNX 4.25. This system does not have the capability
of performing a graceful shutdown, so it loses power while there may
be files open and being written to by our software. We execute
“chkfsys -Pqrsu” in sysinit so the filesystem can be repaired at
each boot.

However, lately we have been experiencing some sort of corruption with
lost links. chkfsys reports the “link lost” error with Paused (Cannot
fix)…

This system does not have a keyboard, so we must powerdown and swap
out
the disk. I have to use fdisk and dinit to restore the entire disk at
this point. I’ve noticed it always is one of our “log” directories
that
it complains about. These are directories where we have tasks that
continually open, write, and close a log file. Other things I’ve
noted:

  1. You can do a “ls -al” command in the directory however, if you try
    to
    list just a file or group of files, ls complains that there is no such
    file
    or directory (and then lists the file name in parenthesis).

  2. The ls command does NOT show the . and … directory entries.

My guess so far is that the actual directory entry table was being
updated at
the time of power loss and was corrupted. Is there any way around
this? Is
there any way to configure Fsys differently to help prevent this? Our
Fsys command within our .boot file is:

Fsys -l 0 -c 100 -d 1 -r 32768 -v 0

Should our cache be set to 0? or something smaller? Should our delay
be
set
to 0? Any ideas?

One other thing I don’t understand. Our tasks that do logging create
the file
at startup and then perform a open, write, close each time it updates
a
file.
I don’t understand why the directory entry gets corrupted. Seems that
this table
should only be opened when a new file is created or if the directory
extent needs
to grow (which I don’t know, maybe that is what is happening).

Thanks in advance.
Rob Davidson