Eric Norton wrote:
I can’t imagine it being a consumer grade compact flash since we buy
them from Diamond Systems who constructs industrial grade PC104
components and they recommend this particular flash. And as you said,
we’ll know if costs more which it does.
I have a pc104 form factor board from a industrial supplier. The board
came with consumer grade compact flash - so I would verify this for your
own sanity. The supplier might not know the context in which you’re
using the flash part, so his recommendation might not apply. In the
end, checking is well worth the effort.
Perhaps the part itself doesn’t do well when full. Bad blocks can
develop (rapidly under consumer grade) and it’s the hardware
controller on the part which is suppose to handle those situations.
But if it’s full - and the chip doesn’t have any spare blocks
corruption could occur.
We’ve also had similiar thoughts on this and it comes down to the wear
leveling algorithms that the compact flash manufacturer uses. If you
have %5 disk space, wear leveling on the remaining free space is
probably less effective, this is pretty much what your saying.
Not really - wear leveling is one aspect. Bad blocks can occur without
an erase cycle on the flash cell. The data literally just corrupts from
lack of charge (consumer grade parts do this). Constant cycling via
erase will ensure the cells keep their charge, but then you only have X
number of cycles per cell.
I’m not trying to paint the picture that the flash part will just up an
die. But the fact that you’re doing constant logging is going to
shorten the life of that part significantly. NAND flash has less
reliability than NOR, shorting that time span a little more. Bad blocks
on top of that as a ‘regular occurance’ shortens the life space even more.
We’ve seen several failure cases over the past couple of years. The one
you describe over the long period of time, we’ve seen. This recent
failure case we feel is different. We had two particular flash disks
fail, one over the course of a year as mentioned by Ryan in the original
post with syslog, but the more recent one failed in only a months time.
It was a brand new flash disk. The common trait between these two
flash disks is that they were full.
But there isn’t a correlation from ‘disk full’ to corruption since you
have no idea when the corruption occured. There seems to be a
relationship, but what exactly that is, is still unknown.
Right now, we have a test in progress where we let our logging system
fill the disk and it hasn’t failed yet but it appears to continue to
write after it reaches 100%. The IDE light is still flashing, and none
of the open, seek or write calls are failing. Now, since we’ve
discovered we are not error checking on the fprintf call and we’re also
not sure if the fprintf even checks for error, then from that we could
think we’re writing to the disk, when really we are not.
Ok.
However, if
that were the case then we would need to explain why an fopen eventually
does fail and corruption happens on a disk thats 100% full and there
shouldn’t have been anything writing to it.
I’m not sure what causes the IDE light to flash, I assume anything on
the IDE bus will light it. If that were the case then you know that from
an OS point of view, its not stopping the writes. Correct me if I’m
wrong here, I’m not sure what the light proves.
The light simply indicates activity - period. It says nothing to the
type of activity (reads/writes/commands etc). I don’t think the light
proves anything, especially given that the underlying media isn’t
actually a HD, it’s hard to form a relationship between whats being put
to the IDE bus and what’s actually occuring to the flash part itself via
a blinking light.
–
Cheers,
Adam
With a PC, I always felt limited by the software available.
On Unix, I am limited only by my knowledge.
–Peter J. Schoenster <pschon@baste.magibox.net>