Filesystem Corruption using Fsys.eide

Ryan_Baillargeon1 · October 5, 2004, 6:56pm

Hello Newsgroup Users. I’ve got a dilly of pickle here for you.
Hopefully you can help me out.

Our system utilizes Compact Flash (IDE interface) running the Fsys.eide
driver. We have a logging system that under normal conditions, writes to
a file thats wraps on itself at a 2mb limit. This is to prevent the file
from getting too large. We had one system thats logging was configured
without a limit. What we experienced is that the file continued to grow,
until it filled the disk to capacity. Even after the disk was at 100%
capacity the logging utility continued to write to the file - and
corrupted the filesystem. The logging system completes an iterative
fopen(r+), fwrite(), fclose(). Nothing funky at all with the software
IMO. I would expect to get an error back from the fwrite(), but the
operation is successful. In fact the system runs for days in this
corrupt state, until a write operation fails.

We have also experienced this failure scenario with syslog as the
culprit. A process (qpage) was writing its errors to syslog and over the
course of a year slowly filled the disk to capacity at which point the
whole filesystem became corrupt.

I have the following questions, and ask you for enlightenment:

1.) Why does the OS allow us to fill the flash disk to 100% and still
allow us to write to a file.

2.) Should we be expecting the fwrite() operation to fail, and if so,
why does it not fail.

3.) Has anyone else experienced this problem? Suggestions?

Thank you for your time,

Ryan B,
Student,
School of Rock.

Kevin_Miller1 · October 5, 2004, 7:08pm

This doesn’t answer your file corruption question, but a way to handle the
syslog file growth issue is to run a cron task periodically, such as at
midnight. On our system, this task does the following:

Creates a new log file with the name as today’s date.log
The syslog configuration file /etc/syslog.conf points at a link to the
current log file, so we re-point that link at the new log file.
Send SIGHUP to syslogd so that it starts using the new log file.
Run a find command to remove all log files 90 days old.

The same script does something similar with the access log. This has worked
well for us.

“Ryan Baillargeon” <rmbaillargeon@fct.ca> wrote in message
news:cjuqk4$p0s$1@inn.qnx.com…

Hello Newsgroup Users. I’ve got a dilly of pickle here for you. Hopefully
you can help me out.

Our system utilizes Compact Flash (IDE interface) running the Fsys.eide
driver. We have a logging system that under normal conditions, writes to a
file thats wraps on itself at a 2mb limit. This is to prevent the file
from getting too large. We had one system thats logging was configured
without a limit. What we experienced is that the file continued to grow,
until it filled the disk to capacity. Even after the disk was at 100%
capacity the logging utility continued to write to the file - and
corrupted the filesystem. The logging system completes an iterative
fopen(r+), fwrite(), fclose(). Nothing funky at all with the software IMO.
I would expect to get an error back from the fwrite(), but the operation
is successful. In fact the system runs for days in this corrupt state,
until a write operation fails.

We have also experienced this failure scenario with syslog as the culprit.
A process (qpage) was writing its errors to syslog and over the course of
a year slowly filled the disk to capacity at which point the whole
filesystem became corrupt.

I have the following questions, and ask you for enlightenment:

1.) Why does the OS allow us to fill the flash disk to 100% and still
allow us to write to a file.

2.) Should we be expecting the fwrite() operation to fail, and if so, why
does it not fail.

3.) Has anyone else experienced this problem? Suggestions?

Thank you for your time,

Ryan B,
Student,
School of Rock.

John_Garvey1 · October 5, 2004, 8:13pm

Ryan Baillargeon wrote:

We have also experienced this failure scenario with syslog as the
culprit. A process (qpage) was writing its errors to syslog and over the
course of a year slowly filled the disk to capacity at which point the
whole filesystem became corrupt.

What is the “sin ver” for Fsys? What size was your disk? All versions
prior to 4.24V had a bug where they did not correctly enforce the 2GB
file limit, so continually writing to a file beyond this size could
result in such corruption. If you have a newer Fsys or your flash disk
is <2GB in size, then ignore this, otherwise considering upgrading.

1.) Why does the OS allow us to fill the flash disk to 100% and still
allow us to write to a file.
2.) Should we be expecting the fwrite() operation to fail, and if so,
why does it not fail.

It should not, it should report an ENOSPC error. Quite often software
does not check the return code from write (and this may include the
buffering fwrite libc itself). Can you try filling up the disk again
and doing some experiments using write() vs fwrite() and see if these
do report the expected ENOSPC error?

Pavol_Kycina1 · October 6, 2004, 6:35am

I have seen the same behaviour…

From my point of view write’s return value is irrelevant. What shouldn’t
happen is file system corruption…

PK

“Ryan Baillargeon” <rmbaillargeon@fct.ca> wrote in message
news:cjuqk4$p0s$1@inn.qnx.com…

Hello Newsgroup Users. I’ve got a dilly of pickle here for you.
Hopefully you can help me out.

Our system utilizes Compact Flash (IDE interface) running the Fsys.eide
driver. We have a logging system that under normal conditions, writes to
a file thats wraps on itself at a 2mb limit. This is to prevent the file
from getting too large. We had one system thats logging was configured
without a limit. What we experienced is that the file continued to grow,
until it filled the disk to capacity. Even after the disk was at 100%
capacity the logging utility continued to write to the file - and
corrupted the filesystem. The logging system completes an iterative
fopen(r+), fwrite(), fclose(). Nothing funky at all with the software
IMO. I would expect to get an error back from the fwrite(), but the
operation is successful. In fact the system runs for days in this
corrupt state, until a write operation fails.

We have also experienced this failure scenario with syslog as the
culprit. A process (qpage) was writing its errors to syslog and over the
course of a year slowly filled the disk to capacity at which point the
whole filesystem became corrupt.

I have the following questions, and ask you for enlightenment:

1.) Why does the OS allow us to fill the flash disk to 100% and still
allow us to write to a file.

2.) Should we be expecting the fwrite() operation to fail, and if so,
why does it not fail.

3.) Has anyone else experienced this problem? Suggestions?

Thank you for your time,

Ryan B,
Student,
School of Rock.

Ryan_Baillargeon1 · October 6, 2004, 5:33pm

Kevin Miller wrote:

This doesn’t answer your file corruption question, but a way to handle the
syslog file growth issue is to run a cron task periodically, such as at
midnight. On our system, this task does the following:

Creates a new log file with the name as today’s date.log

The syslog configuration file /etc/syslog.conf points at a link to the
current log file, so we re-point that link at the new log file.

Send SIGHUP to syslogd so that it starts using the new log file.

Run a find command to remove all log files 90 days old.

The same script does something similar with the access log. This has worked
well for us.

Thanks Kevin, we did implement a Cron-tab to delete the syslog file
after this error occured.

Ryan B.

“Ryan Baillargeon” <> rmbaillargeon@fct.ca> > wrote in message
news:cjuqk4$p0s$> 1@inn.qnx.com> …

Hello Newsgroup Users. I’ve got a dilly of pickle here for you. Hopefully
you can help me out.

Our system utilizes Compact Flash (IDE interface) running the Fsys.eide
driver. We have a logging system that under normal conditions, writes to a
file thats wraps on itself at a 2mb limit. This is to prevent the file
from getting too large. We had one system thats logging was configured
without a limit. What we experienced is that the file continued to grow,
until it filled the disk to capacity. Even after the disk was at 100%
capacity the logging utility continued to write to the file - and
corrupted the filesystem. The logging system completes an iterative
fopen(r+), fwrite(), fclose(). Nothing funky at all with the software IMO.
I would expect to get an error back from the fwrite(), but the operation
is successful. In fact the system runs for days in this corrupt state,
until a write operation fails.

We have also experienced this failure scenario with syslog as the culprit.
A process (qpage) was writing its errors to syslog and over the course of
a year slowly filled the disk to capacity at which point the whole
filesystem became corrupt.

I have the following questions, and ask you for enlightenment:

1.) Why does the OS allow us to fill the flash disk to 100% and still
allow us to write to a file.

2.) Should we be expecting the fwrite() operation to fail, and if so, why
does it not fail.

3.) Has anyone else experienced this problem? Suggestions?

Thank you for your time,

Ryan B,
Student,
School of Rock.

Ryan_Baillargeon1 · October 6, 2004, 7:58pm

John Garvey wrote:

Ryan Baillargeon wrote:

We have also experienced this failure scenario with syslog as the
culprit. A process (qpage) was writing its errors to syslog and over
the course of a year slowly filled the disk to capacity at which point
the whole filesystem became corrupt.

What is the “sin ver” for Fsys? What size was your disk? All versions
prior to 4.24V had a bug where they did not correctly enforce the 2GB
file limit, so continually writing to a file beyond this size could
result in such corruption. If you have a newer Fsys or your flash disk
is <2GB in size, then ignore this, otherwise considering upgrading.

Thank you for replying John, our Flash Device is only 128MB and our
fsys.eide is ver 4.25A Feb 2000. So I dont think this has to do with teh
2GB limit problem, although that is nice to know for future reference.

1.) Why does the OS allow us to fill the flash disk to 100% and still
allow us to write to a file.
2.) Should we be expecting the fwrite() operation to fail, and if so,
why does it not fail.

It should not, it should report an ENOSPC error. Quite often software
does not check the return code from write (and this may include the
buffering fwrite libc itself). Can you try filling up the disk again
and doing some experiments using write() vs fwrite() and see if these
do report the expected ENOSPC error?

Upon inspecting the code first-hand, I realise that the acutal operation
is as follows:

fopen(r+)
fseek( file_index )
fprintf()
fclose()

We do not check errno after the fprintf(), so I’m suspicious that there
could be a problem there.

However, my question still remains, even if I’m not checking the error
codes on the return from fprintf() the operation is successful. fprintf
(at some point) performs a write() operation, even though the disk is
full. Now obviously, we can make this logging system more robust, but
the problem occured with syslog as well, which means syslogd has the
same inherent issue (ie. successful writes on a full disk leading to
widespread filesystem corruption).

I have a support ticket into QNX about this problem, so I’ll report on
the findings as soon as I hear back from them (tommorow, they promise,
as they are swamped).

Cheers
Ryan B.

John_Garvey1 · October 6, 2004, 9:02pm

Ryan Baillargeon wrote:

What is the “sin ver” for Fsys? What size was your disk? All versions
prior to 4.24V had a bug where they did not correctly enforce the 2GB
Thank you for replying John, our Flash Device is only 128MB and our
fsys.eide is ver 4.25A Feb 2000. So I dont think this has to do with teh

“Fsys”, not “Fsys.eide”.

Upon inspecting the code first-hand, I realise that the acutal operation
is as follows:
fopen(r+)
fseek( file_index )
fprintf()
fclose()
We do not check errno after the fprintf(), so I’m suspicious that there
could be a problem there.

Yes, you should check. However, it is possible that fprintf itself
doesn’t check the result of its own write() (eg QNX6 Dinkum libs have
this same bug, where the failure isn’t propogated); this is why I always
avoid using stdio library and go directly to write when I actually care
about the data going to disk

However, my question still remains, even if I’m not checking the error
codes on the return from fprintf() the operation is successful. fprintf
(at some point) performs a write() operation, even though the disk is

That write() should fail with ENOSPC. This is why I suggested you set
up a full disk scenario and verify that write() does behave this way and
chkfsys does not show a resulting error.

full. Now obviously, we can make this logging system more robust, but
the problem occured with syslog as well, which means syslogd has the

Where would syslog log such a message?; a quick look at the code shows
it catches the error and tries to log it … to the syslog file again!

same inherent issue (ie. successful writes on a full disk leading to
widespread filesystem corruption).

Apart from the known bug I mentioned earlier (2GB files), I am not aware
of any such “widespread filesystem corruption” due to full disk in a
recent Fsys. I don’t have a QNX4 system anymore, so I am glad you have
contacted Tech Support, hopefully they can try doing the above …

I have a support ticket into QNX about this problem, so I’ll report on
the findings as soon as I hear back from them (tommorow, they promise,
as they are swamped).

Mario_Charest1 · October 6, 2004, 10:24pm

Is it possible the geometry of the compact flash as seen by Fsys.eide is
different then the real geometry.

“John Garvey” <jgarvey@qnx.com> wrote in message
news:ck1maq$2nc$1@inn.qnx.com…

Ryan Baillargeon wrote:
What is the “sin ver” for Fsys? What size was your disk? All versions
prior to 4.24V had a bug where they did not correctly enforce the 2GB
Thank you for replying John, our Flash Device is only 128MB and our
fsys.eide is ver 4.25A Feb 2000. So I dont think this has to do with teh

“Fsys”, not “Fsys.eide”.

Upon inspecting the code first-hand, I realise that the acutal operation
is as follows:
fopen(r+)
fseek( file_index )
fprintf()
fclose()
We do not check errno after the fprintf(), so I’m suspicious that there
could be a problem there.

Yes, you should check. However, it is possible that fprintf itself
doesn’t check the result of its own write() (eg QNX6 Dinkum libs have this
same bug, where the failure isn’t propogated); this is why I always
avoid using stdio library and go directly to write when I actually care
about the data going to disk >

However, my question still remains, even if I’m not checking the error
codes on the return from fprintf() the operation is successful. fprintf
(at some point) performs a write() operation, even though the disk is

That write() should fail with ENOSPC. This is why I suggested you set up
a full disk scenario and verify that write() does behave this way and
chkfsys does not show a resulting error.

full. Now obviously, we can make this logging system more robust, but the
problem occured with syslog as well, which means syslogd has the

Where would syslog log such a message?; a quick look at the code shows it
catches the error and tries to log it … to the syslog file again!

same inherent issue (ie. successful writes on a full disk leading to
widespread filesystem corruption).

Apart from the known bug I mentioned earlier (2GB files), I am not aware
of any such “widespread filesystem corruption” due to full disk in a
recent Fsys. I don’t have a QNX4 system anymore, so I am glad you have
contacted Tech Support, hopefully they can try doing the above …

I have a support ticket into QNX about this problem, so I’ll report on
the findings as soon as I hear back from them (tommorow, they promise, as
they are swamped).

Adam_Mallory1 · October 6, 2004, 11:21pm

Mario Charest wrote:

Is it possible the geometry of the compact flash as seen by Fsys.eide is
different then the real geometry.

Additionally, what is the grade of the compact flash? If it’s consumer
grade (ie. you can buy it from Best Buy etc), it’s expected lifetime
isn’t good, and high failure rates under sustained use is normal. If
it’s industrial grade (you’ll know because it costs much more), then
failures should be minimal.

Perhaps the part itself doesn’t do well when full. Bad blocks can
develop (rapidly under consumer grade) and it’s the hardware controller
on the part which is suppose to handle those situations. But if it’s
full - and the chip doesn’t have any spare blocks corruption could occur.

Any results from filling the file system to recreate the issue? If the
problem only starts occurring after a certain time period (ie. a year or
so) perhaps the issue is related to lifetime/flash. You are logging
data pretty constantly, and that will shorten the lifetime of the flash
significantly (in addition to the limitations consumer grade flash imposes).

–
Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Eric_Norton1 · October 7, 2004, 3:26pm

Adam Mallory wrote:

Mario Charest wrote:

Is it possible the geometry of the compact flash as seen by Fsys.eide
is different then the real geometry.

This is a good point and we were wondering the same thing. Not sure how
we can go about finding out if this is the case.

Additionally, what is the grade of the compact flash? If it’s consumer
grade (ie. you can buy it from Best Buy etc), it’s expected lifetime
isn’t good, and high failure rates under sustained use is normal. If
it’s industrial grade (you’ll know because it costs much more), then
failures should be minimal.

I can’t imagine it being a consumer grade compact flash since we buy
them from Diamond Systems who constructs industrial grade PC104
components and they recommend this particular flash. And as you said,
we’ll know if costs more which it does.

Perhaps the part itself doesn’t do well when full. Bad blocks can
develop (rapidly under consumer grade) and it’s the hardware controller
on the part which is suppose to handle those situations. But if it’s
full - and the chip doesn’t have any spare blocks corruption could occur.

We’ve also had similiar thoughts on this and it comes down to the wear
leveling algorithms that the compact flash manufacturer uses. If you
have %5 disk space, wear leveling on the remaining free space is
probably less effective, this is pretty much what your saying.

Any results from filling the file system to recreate the issue? If the
problem only starts occurring after a certain time period (ie. a year or
so) perhaps the issue is related to lifetime/flash. You are logging
data pretty constantly, and that will shorten the lifetime of the flash
significantly (in addition to the limitations consumer grade flash
imposes).

We’ve seen several failure cases over the past couple of years. The one
you describe over the long period of time, we’ve seen. This recent
failure case we feel is different. We had two particular flash disks
fail, one over the course of a year as mentioned by Ryan in the original
post with syslog, but the more recent one failed in only a months time.
It was a brand new flash disk. The common trait between these two
flash disks is that they were full.

Right now, we have a test in progress where we let our logging system
fill the disk and it hasn’t failed yet but it appears to continue to
write after it reaches 100%. The IDE light is still flashing, and none
of the open, seek or write calls are failing. Now, since we’ve
discovered we are not error checking on the fprintf call and we’re also
not sure if the fprintf even checks for error, then from that we could
think we’re writing to the disk, when really we are not. However, if
that were the case then we would need to explain why an fopen eventually
does fail and corruption happens on a disk thats 100% full and there
shouldn’t have been anything writing to it.

I’m not sure what causes the IDE light to flash, I assume anything on
the IDE bus will light it. If that were the case then you know that from
an OS point of view, its not stopping the writes. Correct me if I’m
wrong here, I’m not sure what the light proves.

Eric

Adam_Mallory1 · October 7, 2004, 4:26pm

Eric Norton wrote:

I can’t imagine it being a consumer grade compact flash since we buy
them from Diamond Systems who constructs industrial grade PC104
components and they recommend this particular flash. And as you said,
we’ll know if costs more which it does.

I have a pc104 form factor board from a industrial supplier. The board
came with consumer grade compact flash - so I would verify this for your
own sanity. The supplier might not know the context in which you’re
using the flash part, so his recommendation might not apply. In the
end, checking is well worth the effort.

Perhaps the part itself doesn’t do well when full. Bad blocks can
develop (rapidly under consumer grade) and it’s the hardware
controller on the part which is suppose to handle those situations.
But if it’s full - and the chip doesn’t have any spare blocks
corruption could occur.

We’ve also had similiar thoughts on this and it comes down to the wear
leveling algorithms that the compact flash manufacturer uses. If you
have %5 disk space, wear leveling on the remaining free space is
probably less effective, this is pretty much what your saying.

Not really - wear leveling is one aspect. Bad blocks can occur without
an erase cycle on the flash cell. The data literally just corrupts from
lack of charge (consumer grade parts do this). Constant cycling via
erase will ensure the cells keep their charge, but then you only have X
number of cycles per cell.

I’m not trying to paint the picture that the flash part will just up an
die. But the fact that you’re doing constant logging is going to
shorten the life of that part significantly. NAND flash has less
reliability than NOR, shorting that time span a little more. Bad blocks
on top of that as a ‘regular occurance’ shortens the life space even more.

We’ve seen several failure cases over the past couple of years. The one
you describe over the long period of time, we’ve seen. This recent
failure case we feel is different. We had two particular flash disks
fail, one over the course of a year as mentioned by Ryan in the original
post with syslog, but the more recent one failed in only a months time.
It was a brand new flash disk. The common trait between these two
flash disks is that they were full.

But there isn’t a correlation from ‘disk full’ to corruption since you
have no idea when the corruption occured. There seems to be a
relationship, but what exactly that is, is still unknown.

Right now, we have a test in progress where we let our logging system
fill the disk and it hasn’t failed yet but it appears to continue to
write after it reaches 100%. The IDE light is still flashing, and none
of the open, seek or write calls are failing. Now, since we’ve
discovered we are not error checking on the fprintf call and we’re also
not sure if the fprintf even checks for error, then from that we could
think we’re writing to the disk, when really we are not.

Ok.

However, if
that were the case then we would need to explain why an fopen eventually
does fail and corruption happens on a disk thats 100% full and there
shouldn’t have been anything writing to it.

I’m not sure what causes the IDE light to flash, I assume anything on
the IDE bus will light it. If that were the case then you know that from
an OS point of view, its not stopping the writes. Correct me if I’m
wrong here, I’m not sure what the light proves.

The light simply indicates activity - period. It says nothing to the
type of activity (reads/writes/commands etc). I don’t think the light
proves anything, especially given that the underlying media isn’t
actually a HD, it’s hard to form a relationship between whats being put
to the IDE bus and what’s actually occuring to the flash part itself via
a blinking light.

–
Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Ryan_Baillargeon1 · October 7, 2004, 5:14pm

I just want to take a step back for a second. Everything aside, I have
the following question:

Please confirm for me, in regards to FSYS, when it gets a request to
write to the disk, and the disk is full, what does it do?

sin ver:
FSYS FSYS32 4.24Y April 23, 2002
FSYS.eide EIDE 4.25G April 15, 2002

Hello Newsgroup Users. I’ve got a dilly of pickle here for you.
Hopefully you can help me out.

Our system utilizes Compact Flash (IDE interface) running the Fsys.eide
driver. We have a logging system that under normal conditions, writes to
a file thats wraps on itself at a 2mb limit. This is to prevent the file
from getting too large. We had one system thats logging was configured
without a limit. What we experienced is that the file continued to grow,
until it filled the disk to capacity. Even after the disk was at 100%
capacity the logging utility continued to write to the file - and
corrupted the filesystem. The logging system completes an iterative
fopen(r+), fwrite(), fclose(). Nothing funky at all with the software
IMO. I would expect to get an error back from the fwrite(), but the
operation is successful. In fact the system runs for days in this
corrupt state, until a write operation fails.

We have also experienced this failure scenario with syslog as the
culprit. A process (qpage) was writing its errors to syslog and over the
course of a year slowly filled the disk to capacity at which point the
whole filesystem became corrupt.

I have the following questions, and ask you for enlightenment:

1.) Why does the OS allow us to fill the flash disk to 100% and still
allow us to write to a file.

2.) Should we be expecting the fwrite() operation to fail, and if so,
why does it not fail.

3.) Has anyone else experienced this problem? Suggestions?

Thank you for your time,

Ryan B,
Student,
School of Rock.

Eric_Norton1 · October 7, 2004, 5:53pm

Not really - wear leveling is one aspect. Bad blocks can occur without
an erase cycle on the flash cell. The data literally just corrupts from
lack of charge (consumer grade parts do this). Constant cycling via
erase will ensure the cells keep their charge, but then you only have X
number of cycles per cell.

I was basically saying that their is most likely connection between poor
wear leveling and bad blocks occuring when the disk that has little
space left to wear level on.

I’m not trying to paint the picture that the flash part will just up an
die. But the fact that you’re doing constant logging is going to
shorten the life of that part significantly. NAND flash has less
reliability than NOR, shorting that time span a little more. Bad blocks
on top of that as a ‘regular occurance’ shortens the life space even more.

I agree here with you, but in this situation we need to find out why one
disk lasted a full year and another lasted only a month with the exact
same software and same constant logging. They should have roughly the
same lifespan.

However, if that were the case then we would need to explain why an
fopen eventually does fail and corruption happens on a disk thats 100%
full and there shouldn’t have been anything writing to it.

Can corruption still happen on a flash disk that is 100% full?

Eric

Adam_Mallory1 · October 7, 2004, 6:19pm

Eric Norton wrote:

I was basically saying that their is most likely connection between poor
wear leveling and bad blocks occuring when the disk that has little
space left to wear level on.

Wear leveling occur regardless of empty or full blocks.

I agree here with you, but in this situation we need to find out why one
disk lasted a full year and another lasted only a month with the exact
same software and same constant logging. They should have roughly the
same lifespan.

Given that all other variables are the same - ie. environment, use
scenarios (ie. power loss etc) etc. I would agree.

Can corruption still happen on a flash disk that is 100% full?

Yes. Like i said prior, bad blocks can just sporadically occur in
consumer parts (outside of the pervue of the controller). That would be
corruption without any interaction from the filesystem or the flash
controller.

Let us know what you find.

–
Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

Eric_Norton1 · October 7, 2004, 6:34pm

Wear leveling occur regardless of empty or full blocks.

Thats interesting and good to know.

Yes. Like i said prior, bad blocks can just sporadically occur in
consumer parts (outside of the pervue of the controller). That would be
corruption without any interaction from the filesystem or the flash
controller.

Now, with an industrial grade flashdisk would you say that this will not
happen or is it just more unlikely to happen?

Let us know what you find.

Most Certainly.

Eric

Adam_Mallory1 · October 7, 2004, 7:31pm

Eric Norton wrote:

Now, with an industrial grade flashdisk would you say that this will not
happen or is it just more unlikely to happen?

More unlikely.

Consumer grade flash is high yield, which means they use more of the
wafer even if some of the chips have defects. The criteria used to
select chips for industrial grade is much more stringent.

That said, from a reliability point of view NOR flash is more reliable
than NAND flash (which is what is used in flash disks, usb keys etc).
But I doubt NOR flash is feasible for you, and it won’t solve your issue
with existing machines in the field.

–
Cheers,
Adam

QNX Software Systems Ltd.
[ amallory@qnx.com ]

With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>

John_Garvey1 · October 8, 2004, 2:52am

Ryan Baillargeon wrote:

I just want to take a step back for a second. Everything aside, I have
the following question:
Please confirm for me, in regards to FSYS, when it gets a request to
write to the disk, and the disk is full, what does it do?

As I said, I don’t have QNX4. But I did install a version to verify
that it does behave as I claim (namely ENOSPC return and no corruption).

Check Fsys version

sin ver

/bin/Fsys Fsys32 4.24W Aug 02 2001

Fill up the disk (as you can see “dd” gets ENOSPC)

dd if=/dev/hd0 of=/tmp/fillmeup bs=1k count=6320

dd: /tmp/fillmeup: No space left on device
6219+0 records in
6218+0 records out

Check it really is full

df -h /

File system (kb) Total User Used Free Used
//1/dev/hd0t77 9237375 9234360 9234360 0 100%

Check for corruption (chkfsys runs and reports no errors)

chkfsys -fu /

Run the attached test program on full disk to see how it behaves

./full

write() = -1/28/No space left on device
fwrite() = 1/0/No error
fflush() = -1/28/No space left on device

Re-run chkfsys (no errors or corruption reported).

So, as I said, write() returns an ENOSPC immediately, fwrite() doesn’t
(since stdio is buffering; could change this with setvbuf(), otherwise
the error is reported when you force to disk with fflush()). chkfsys
reports no corruption, hence a full disk situation does not appear to
cause this. You can run the attached program yourself to verify similar
behaviour. My test was on a hard disk, so I would tend to suspect your
Compact Flash if you observe anything different to this behaviour …

Mario_Charest1 · October 8, 2004, 4:53am

And what about running dcheck with -rlV option. This will check the device
and bypass any potential filesystem bug.

“John Garvey” <jgarvey@qnx.com> wrote in message
news:ck4v7a$hkm$1@inn.qnx.com…

Ryan Baillargeon wrote:
I just want to take a step back for a second. Everything aside, I have
the following question:
Please confirm for me, in regards to FSYS, when it gets a request to
write to the disk, and the disk is full, what does it do?

As I said, I don’t have QNX4. But I did install a version to verify
that it does behave as I claim (namely ENOSPC return and no corruption).

Check Fsys version

sin ver

/bin/Fsys Fsys32 4.24W Aug 02 2001

Fill up the disk (as you can see “dd” gets ENOSPC)

dd if=/dev/hd0 of=/tmp/fillmeup bs=1k count=6320

dd: /tmp/fillmeup: No space left on device
6219+0 records in
6218+0 records out

Check it really is full

df -h /

File system (kb) Total User Used Free Used
//1/dev/hd0t77 9237375 9234360 9234360 0 100%

Check for corruption (chkfsys runs and reports no errors)

chkfsys -fu /

Run the attached test program on full disk to see how it behaves

./full

write() = -1/28/No space left on device
fwrite() = 1/0/No error
fflush() = -1/28/No space left on device

Re-run chkfsys (no errors or corruption reported).

So, as I said, write() returns an ENOSPC immediately, fwrite() doesn’t
(since stdio is buffering; could change this with setvbuf(), otherwise
the error is reported when you force to disk with fflush()). chkfsys
reports no corruption, hence a full disk situation does not appear to
cause this. You can run the attached program yourself to verify similar
behaviour. My test was on a hard disk, so I would tend to suspect your
Compact Flash if you observe anything different to this behaviour …

#include <errno.h
#include <fcntl.h
#include <stdio.h

int main(int argc, char *argv[])
{
int fd, n;
FILE *fp;

fd = open("/tmp/full", O_RDWR | O_CREAT, 0666);
errno = EOK, n = write(fd, “”, 1);
fprintf(stderr, “write() = %d/%d/%s\n”, n, errno, strerror(errno));
close(fd);

fp = fopen("/tmp/full", “r+”);
errno = EOK, n = fwrite("", 1, 1, fp);
fprintf(stderr, “fwrite() = %d/%d/%s\n”, n, errno, strerror(errno));
errno = EOK, n = fflush(fp);
fprintf(stderr, “fflush() = %d/%d/%s\n”, n, errno, strerror(errno));
fclose(fp);
return(0);
}

Ryan_Baillargeon1 · October 8, 2004, 7:49pm

I just want to thank Mario and John for their effort and expertise in
this matter. I will be looking at your suggestions, and running that
code and some other tests when I return from Thanksgiving vacation.

Thanks again and have a good long weekend.

Cheers, Ryan B.

Mario Charest wrote:

And what about running dcheck with -rlV option. This will check the device
and bypass any potential filesystem bug.

“John Garvey” <> jgarvey@qnx.com> > wrote in message
news:ck4v7a$hkm$> 1@inn.qnx.com> …

Ryan Baillargeon wrote:

I just want to take a step back for a second. Everything aside, I have
the following question:
Please confirm for me, in regards to FSYS, when it gets a request to
write to the disk, and the disk is full, what does it do?

As I said, I don’t have QNX4. But I did install a version to verify
that it does behave as I claim (namely ENOSPC return and no corruption).

Check Fsys version

sin ver

/bin/Fsys Fsys32 4.24W Aug 02 2001

Fill up the disk (as you can see “dd” gets ENOSPC)

dd if=/dev/hd0 of=/tmp/fillmeup bs=1k count=6320

dd: /tmp/fillmeup: No space left on device
6219+0 records in
6218+0 records out

Check it really is full

df -h /

File system (kb) Total User Used Free Used
//1/dev/hd0t77 9237375 9234360 9234360 0 100%

Check for corruption (chkfsys runs and reports no errors)

chkfsys -fu /

Run the attached test program on full disk to see how it behaves

./full

write() = -1/28/No space left on device
fwrite() = 1/0/No error
fflush() = -1/28/No space left on device

Re-run chkfsys (no errors or corruption reported).

So, as I said, write() returns an ENOSPC immediately, fwrite() doesn’t
(since stdio is buffering; could change this with setvbuf(), otherwise
the error is reported when you force to disk with fflush()). chkfsys
reports no corruption, hence a full disk situation does not appear to
cause this. You can run the attached program yourself to verify similar
behaviour. My test was on a hard disk, so I would tend to suspect your
Compact Flash if you observe anything different to this behaviour …

\

#include <errno.h
#include <fcntl.h
#include <stdio.h

int main(int argc, char *argv[])
{
int fd, n;
FILE *fp;

fd = open("/tmp/full", O_RDWR | O_CREAT, 0666);
errno = EOK, n = write(fd, “”, 1);
fprintf(stderr, “write() = %d/%d/%s\n”, n, errno, strerror(errno));
close(fd);

fp = fopen("/tmp/full", “r+”);
errno = EOK, n = fwrite("", 1, 1, fp);
fprintf(stderr, “fwrite() = %d/%d/%s\n”, n, errno, strerror(errno));
errno = EOK, n = fflush(fp);
fprintf(stderr, “fflush() = %d/%d/%s\n”, n, errno, strerror(errno));
fclose(fp);
return(0);
}
\

Ryan_Baillargeon1 · October 8, 2004, 7:50pm

Oh, I didnt forget you Adam… and everyone else…
Cheers.

Filesystem Corruption using Fsys.eide

QNX Software Systems Ltd. [ amallory@qnx.com ]

QNX Software Systems Ltd. [ amallory@qnx.com ]

QNX Software Systems Ltd. [ amallory@qnx.com ]

QNX Software Systems Ltd. [ amallory@qnx.com ]

sin ver

dd if=/dev/hd0 of=/tmp/fillmeup bs=1k count=6320

df -h /

chkfsys -fu /

./full

sin ver

dd if=/dev/hd0 of=/tmp/fillmeup bs=1k count=6320

df -h /

chkfsys -fu /

./full

sin ver

dd if=/dev/hd0 of=/tmp/fillmeup bs=1k count=6320

df -h /

chkfsys -fu /

./full

QNX Software Systems Ltd.
[ amallory@qnx.com ]

QNX Software Systems Ltd.
[ amallory@qnx.com ]

QNX Software Systems Ltd.
[ amallory@qnx.com ]

QNX Software Systems Ltd.
[ amallory@qnx.com ]