Flash and corruption; FUD?

I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash technology
used, usage patterns when it failed, and whether this is a QNX 6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK


[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

Mario Charest postmaster@127.0.0.1 wrote:

“Robert Krten” <> rk@parse.com> > wrote in message
news:c5m3na$9cv$> 1@inn.qnx.com> …
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash technology
used, usage patterns when it failed, and whether this is a QNX 6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Maybe first thing is to upgrade to the lastest flash file system which is
support to me more resiliant.

Push the (perhaps repaired) car back up the hill and see if it does it again? :slight_smile:

I guess if QSS took to time to write/design a new flash file system is
because the old one wasn’t so resiliant after all > :wink:

I’m working the other side of the house, Mario. I’ve given up my “cowboy
programming” ways (for this contract). One of the customers is a medical
instrument supplier, and it’s simply not an option to “just upgrade” in
the (perhaps vain) hope that it will help – there really should be a
documented quality process which the OS vendor follows, listing all of the
bugs, which versions they were found in, which versions they were fixed in,
customer impacts, yaddy yaddy yaddy.
If QSSL had this (and I’m only guessing they don’t) then it would be a
simple matter to just jump in and say “Yes, the following flash corruption
issues were fixed in component X in version B”.

THEN AND ONLY THEN would it make sense to upgrade component X from version
A to version B, knowing that bugs J, K, and L were fixed in version B.

<vague, marginally relevant rant>
Working this side of the house, I really hate the “just try the latest”
approach – it really begs the questions of a) do you actually know what
bugs are present in the current product, and b) if not, why did you go
and make a new version? I see this when a vendor is presented with a crash
dump, a version number, and a program counter location from a customer,
and the vendor replies, “Oh, gosh, we don’t have that version around any more
so I can’t tell you why it died. Sorry. Hey, why don’t you try the latest
and see if it dies?” For “cowboy programming”, that’s just fine. For
highly-reliable systems, that’s just unacceptable.
</vague, marginally relevant rant>

Now, to be fair, Mario, at this point I’m just fishing, and I probably jumped
all over you for no reason :slight_smile:

When I have more concrete information, I’ll actually be in a position to say
“OS version X, flash filesystem type C, flash filesystem driver version W,
fails in such and such a manner – is there a fix?”

Cheers,
-RK

P.S. Sorry for the rant – I’m doing traceability matrices from functional
specifications to system architecture to detailed software design docs – you
just get into that mindset after a while. It’s not all bad, some of it is
sorely needed in other areas of software development. For the long answer
of “why?” see:

http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html

Scary stuff indeed, and I’m in the same building. :frowning:


[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

“Robert Krten” <rk@parse.com> wrote in message
news:c5m3na$9cv$1@inn.qnx.com

I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash technology
used, usage patterns when it failed, and whether this is a QNX 6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Maybe first thing is to upgrade to the lastest flash file system which is
support to me more resiliant.

I guess if QSS took to time to write/design a new flash file system is
because the old one wasn’t so resiliant after all :wink:

Thanks in advance for your input!

Cheers,
-RK


[If replying via email, you’ll need to click on the URL that’s emailed to
you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector > http://www.parse.com/~pdp8/

Hi Robert,

Periodicly it happens to see discussions like this at qnx.org.ru
http://qnx.org.ru/index.php?option=com_minibb&action=vthread&forum=5
&topic=2345
[Somewhere at qor there was even discussion and link to Therac 25 story
:slight_smile:]
Probably you need some online translator to read it. Or I can translate
the key statements from that post:

QNX 6.2.0 system can’t boot from:
SanDisk CompactFlash 64Mb
SDCFB
PAT. 5070032, 5418752, 5602987
AB0311OAA CHINA

It is ATA compatible device, so devb-eide driver and QNZ partition
created on flash by fdisk. Booting from HDD, with SanDisk connected on
secondary channel, shows /dev/hd1 appears after 1-2 mins after booting.
MS-DOS boots/works just fine.

There is no problem to boot the system from
SanDisk CompactFlash 32Mb (Industrial Grade)
02/21/03
sdcfbi-32-101-80
440025I
PAT. 5070032, 5172338, 5418752, 5602987
AA0209QQ CHINA

But they need more than 32 Mb disk space. Problem solved by replacing CF
to the same as not working 64Mb ones but with magic “Industrial Grade”
mark (and probably 1 additional pat.:astonished:).

Was it useful?

Cheers,
Eduard.

In article <c5m3na$9cv$1@inn.qnx.com>, rk@parse.com says…

I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash technology
used, usage patterns when it failed, and whether this is a QNX 6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK

ed1k <ed1k@fake.address> wrote:

Hi Robert,

Periodicly it happens to see discussions like this at qnx.org.ru
http://qnx.org.ru/index.php?option=com_minibb&action=vthread&forum=5
&topic=2345
[Somewhere at qor there was even discussion and link to Therac 25 story
:slight_smile:> ]
Probably you need some online translator to read it. Or I can translate
the key statements from that post:

QNX 6.2.0 system can’t boot from:
SanDisk CompactFlash 64Mb
SDCFB
PAT. 5070032, 5418752, 5602987
AB0311OAA CHINA

It is ATA compatible device, so devb-eide driver and QNZ partition
created on flash by fdisk. Booting from HDD, with SanDisk connected on
secondary channel, shows /dev/hd1 appears after 1-2 mins after booting.
MS-DOS boots/works just fine.

There is no problem to boot the system from
SanDisk CompactFlash 32Mb (Industrial Grade)
02/21/03
sdcfbi-32-101-80
440025I
PAT. 5070032, 5172338, 5418752, 5602987
AA0209QQ CHINA

But they need more than 32 Mb disk space. Problem solved by replacing CF
to the same as not working 64Mb ones but with magic “Industrial Grade”
mark (and probably 1 additional pat.> :astonished:> ).

Was it useful?

It’s a datapoint :slight_smile: Thanks,
-RK


Cheers,
Eduard.

In article <c5m3na$9cv$> 1@inn.qnx.com> >, > rk@parse.com > says…
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash technology
used, usage patterns when it failed, and whether this is a QNX 6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK


[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:c5n3ln$51k$1@inn.qnx.com

“Robert Krten” <> rk@parse.com> > wrote in message
news:c5m3na$9cv$> 1@inn.qnx.com> …
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Maybe first thing is to upgrade to the lastest flash file system which is
support to me more resiliant.

I guess if QSS took to time to write/design a new flash file system is
because the old one wasn’t so resiliant after all > :wink:

Indeed. But the new one has its own bugs. Stuff does not get really updated
when you overwrite files in certain cases, you get inconsistent directory
listings, et cetera. Something funny happens with attributes once in a
while… Some of that was fixed in private updates that are not part of any
official release though…

– igor

“Robert Krten” <rk@parse.com> wrote in message
news:c5n5ar$6fg$1@inn.qnx.com

Mario Charest postmaster@127.0.0.1 wrote:

“Robert Krten” <> rk@parse.com> > wrote in message
news:c5m3na$9cv$> 1@inn.qnx.com> …
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Maybe first thing is to upgrade to the lastest flash file system which
is
support to me more resiliant.

Push the (perhaps repaired) car back up the hill and see if it does it
again? > :slight_smile:

I guess if QSS took to time to write/design a new flash file system is
because the old one wasn’t so resiliant after all > :wink:

I’m working the other side of the house, Mario. I’ve given up my “cowboy
programming” ways (for this contract). One of the customers is a medical
instrument supplier, and it’s simply not an option to “just upgrade” in
the (perhaps vain) hope that it will help – there really should be a
documented quality process which the OS vendor follows, listing all of the
bugs, which versions they were found in, which versions they were fixed
in,
customer impacts, yaddy yaddy yaddy.
If QSSL had this (and I’m only guessing they don’t) then it would be a
simple matter to just jump in and say “Yes, the following flash corruption
issues were fixed in component X in version B”.

THEN AND ONLY THEN would it make sense to upgrade component X from
version
A to version B, knowing that bugs J, K, and L were fixed in version B.

Well now. You obviously haven’t been working that side of the house long
enough :wink:
As you progress toward SEI CMM5 and then CMMI organisation model, you
eventually get to the point of really hating the stuff, because process
engineers required to maintain certain illusions and appearances starts to
outnumber the actual developers. They gonna be busy predicting how many
defects of each severity you will have at certain time on your next project
that you haven’t come up with yet.

That said, a proper source control system integrated with defect tracking
and build labeling is a requirement for any respectable software company.
That means capability to reproduce any officially released build at any
time. Beyond that, a company employing about 40 actual developers and just
over a hundred total employees neither can afford nor really needs all the
yaddy yaddy. It can do better with more agile software processes. That
should not cloud the fact that it might indeed benefit from having a process
at all :wink:

P.S. Sorry for the rant – I’m doing traceability matrices from
functional
specifications to system architecture to detailed software design docs –
you
just get into that mindset after a while. It’s not all bad, some of it is
sorely needed in other areas of software development. For the long answer
of “why?” see:

http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html

Scary stuff indeed, and I’m in the same building. > :frowning:

There are also plenty of answers in line of “why not” and no shortage of
interesting books on the subject.
Don’t get too carried away with the dark side of force :wink:

– igor

Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and write
over old binaries any way). Also, notice that flash gets corrupted when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:

I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash technology
used, usage patterns when it failed, and whether this is a QNX 6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK

Miguel Simon <simon@ou.edu> wrote:

Hi RK…

I hope that all is well.

So far so good :slight_smile:

Thanks for the datapoint, Miguel!

Cheers,
-RK

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and write
over old binaries any way). Also, notice that flash gets corrupted when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash technology
used, usage patterns when it failed, and whether this is a QNX 6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK


[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

“Miguel Simon” <simon@ou.edu> wrote in message
news:c5slto$ol1$1@inn.qnx.com

Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

DiskOnChip driver are not made by QNX. Rob are you talking about QNX flash
file system or DOC?

Mario Charest postmaster@127.0.0.1 wrote:

“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)


DiskOnChip driver are not made by QNX. Rob are you talking about QNX flash
file system or DOC?

Any and all. Waiting for further input from the customers as to what they
are using. I’ve been invited to “investigate flash corruption” on their
behalf, without (yet) being given specific details about what they are
using (s/w and h/w) – however, I expect to have those details this week…
Also, we are looking at using a flash filesystem in the contract I’m currently
on, so this just kills so many birds with the one stone :slight_smile:

Cheers,
-RK


[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has never
given me any problems, but IDE flash devices always have, and plain vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power down
before giving it around 10 seconds, things would be corrupted. Also, we got
into the habit of using QNX 4’s sync command, that helped as well. I’ve
never really come up with a solution with QNX6, except to use DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of the
TrueFFS driver. They have apparently have algorithms to help insure data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new data has
been fully committed. Its a Write then Delete process instead of delete
then write. This makes the device tolerant to being powered down before all
the data has been committed to the device. Don’t remember the specifics.
But you might check it out.

Hope it helps.

Kevin


“Miguel Simon” <simon@ou.edu> wrote in message
news:c5slto$ol1$1@inn.qnx.com

Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and write
over old binaries any way). Also, notice that flash gets corrupted when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK

nntp.qnx.com <k@s.com> wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has never
given me any problems, but IDE flash devices always have, and plain vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power down
before giving it around 10 seconds, things would be corrupted. Also, we got
into the habit of using QNX 4’s sync command, that helped as well. I’ve
never really come up with a solution with QNX6, except to use DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of the
TrueFFS driver. They have apparently have algorithms to help insure data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new data has
been fully committed. Its a Write then Delete process instead of delete
then write. This makes the device tolerant to being powered down before all
the data has been committed to the device. Don’t remember the specifics.
But you might check it out.

Hope it helps.

Kevin

Thanks Kevin, appreciate that!

I should be getting the hardware from the customer this week, as well as
some idea of what software they are using, so I should be in a good
position to proceed.

Cheers,
-RK

“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and write
over old binaries any way). Also, notice that flash gets corrupted when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK


[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

I’m running QNX4 on a network of 1996 vintage boards which were recently
upgraded to IDE flash modules (the ones that go into the sockets directly,
but look to the software like IDE drives). The original set of 18 was from
ICP but I sent these back when we had three failures. Theu were replaced
by others also from ICP, but they boot as Sandisk SDC1-32. They are the
44-pin version (ie 2mm pitch connector).
These ones have not failed, but they do suffer data corruption. It is
always related to power down, but I’m not certain that it relates
necessarily to whether the drive was written to since previously powered
up.
It is curious that we are using these modules since we could not make
Compact Flash work, even though we were using boards that allegedly
supported CF as an IDE drive. We could not load a QNX disk image onto
these drives and have them last reliably. The boards we were using for
this were current - PCM5823 3.5inch form factor.
We don’t have a good record with flash at all.
We originally use M-systems PC104 flash disks, and apart from a number of
drive failures over the years (probably 4 in 18 systems in 6 years) they
did not corrupt data. This tends to support the TFFS data integrity theory.


nntp.qnx.com wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has never
given me any problems, but IDE flash devices always have, and plain vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power down
before giving it around 10 seconds, things would be corrupted. Also, we got
into the habit of using QNX 4’s sync command, that helped as well. I’ve
never really come up with a solution with QNX6, except to use DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of the
TrueFFS driver. They have apparently have algorithms to help insure data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new data has
been fully committed. Its a Write then Delete process instead of delete
then write. This makes the device tolerant to being powered down before all
the data has been committed to the device. Don’t remember the specifics.
But you might check it out.

Hope it helps.

Kevin



“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and write
over old binaries any way). Also, notice that flash gets corrupted when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK

Donald backstrom <donaldb@cstgroup.com.au> wrote:

I’m running QNX4 on a network of 1996 vintage boards which were recently
upgraded to IDE flash modules (the ones that go into the sockets directly,
but look to the software like IDE drives). The original set of 18 was from
ICP but I sent these back when we had three failures. Theu were replaced
by others also from ICP, but they boot as Sandisk SDC1-32. They are the
44-pin version (ie 2mm pitch connector).
These ones have not failed, but they do suffer data corruption. It is
always related to power down, but I’m not certain that it relates
necessarily to whether the drive was written to since previously powered
up.
It is curious that we are using these modules since we could not make
Compact Flash work, even though we were using boards that allegedly
supported CF as an IDE drive. We could not load a QNX disk image onto
these drives and have them last reliably. The boards we were using for
this were current - PCM5823 3.5inch form factor.
We don’t have a good record with flash at all.
We originally use M-systems PC104 flash disks, and apart from a number of
drive failures over the years (probably 4 in 18 systems in 6 years) they
did not corrupt data. This tends to support the TFFS data integrity theory.

Hi Donald,

thanks for the datapoint! I will be working on this issue some more this
week / next week. I did receive input from another user that there is
definitely some corruption related to powerdown, and that the last write
needs to take place something like 30 to 120 seconds before the flash
is powered down – this is related to “automatic” block shuffling done
by the flash’s built in controller.

More as I have it.

Cheers,
-RK

nntp.qnx.com wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has never
given me any problems, but IDE flash devices always have, and plain vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power down
before giving it around 10 seconds, things would be corrupted. Also, we got
into the habit of using QNX 4’s sync command, that helped as well. I’ve
never really come up with a solution with QNX6, except to use DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of the
TrueFFS driver. They have apparently have algorithms to help insure data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new data has
been fully committed. Its a Write then Delete process instead of delete
then write. This makes the device tolerant to being powered down before all
the data has been committed to the device. Don’t remember the specifics.
But you might check it out.

Hope it helps.

Kevin



“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and write
over old binaries any way). Also, notice that flash gets corrupted when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this point,
but the general consensus is that these devices “work just fine” under
Windows CE, DOS, and other OS’s, and “experience corruption” when used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK



\

[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

Robert,

Not sure if you are still interested in this issue - I hope so!

I have had big problems with flash corruption under QNX. We use CompactFlash
1Gb cards, with QNX 6.2.1A.

We seem to be severely exercising the problem now with a recurrent rebooting
problem. Because of a separate problem, our system reboots every hour. After
about 4 days the flash disk on which the OS resides becomes heavily
corrupted.

This has happened to at least 2 disks. The patterns of bad blocks are
interesting: in both cases, the bad blocks occupied exactly 3.2% of the disk
and were almost entirely contiguous. They were in different locations
though.

I have been racking my brain about what in the OS could possibly be writing
heavily to the disk during normal operation. There really should be very
little activity on the disk beyond reading it at bootup. However, your
suggestions that it is to do with unordered shutdown would make much sense
in my experience.

Flash Disk: Sandisk 1GB CompactFlash
I/F: IDE (treat as standard BIOS hard drive)
OS: QNX 6.2.1A

Regards,
Robert Muil.

“Robert Krten” <rk@parse.com> wrote in message
news:c7d9l7$o4u$1@inn.qnx.com

Donald backstrom <> donaldb@cstgroup.com.au> > wrote:
I’m running QNX4 on a network of 1996 vintage boards which were recently
upgraded to IDE flash modules (the ones that go into the sockets
directly,
but look to the software like IDE drives). The original set of 18 was
from
ICP but I sent these back when we had three failures. Theu were replaced
by others also from ICP, but they boot as Sandisk SDC1-32. They are the
44-pin version (ie 2mm pitch connector).
These ones have not failed, but they do suffer data corruption. It is
always related to power down, but I’m not certain that it relates
necessarily to whether the drive was written to since previously powered
up.
It is curious that we are using these modules since we could not make
Compact Flash work, even though we were using boards that allegedly
supported CF as an IDE drive. We could not load a QNX disk image onto
these drives and have them last reliably. The boards we were using for
this were current - PCM5823 3.5inch form factor.
We don’t have a good record with flash at all.
We originally use M-systems PC104 flash disks, and apart from a number of
drive failures over the years (probably 4 in 18 systems in 6 years) they
did not corrupt data. This tends to support the TFFS data integrity
theory.

Hi Donald,

thanks for the datapoint! I will be working on this issue some more this
week / next week. I did receive input from another user that there is
definitely some corruption related to powerdown, and that the last write
needs to take place something like 30 to 120 seconds before the flash
is powered down – this is related to “automatic” block shuffling done
by the flash’s built in controller.

More as I have it.

Cheers,
-RK

nntp.qnx.com wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has
never
given me any problems, but IDE flash devices always have, and plain
vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power
down
before giving it around 10 seconds, things would be corrupted. Also, we
got
into the habit of using QNX 4’s sync command, that helped as well. I’ve
never really come up with a solution with QNX6, except to use DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of
the
TrueFFS driver. They have apparently have algorithms to help insure
data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new data
has
been fully committed. Its a Write then Delete process instead of delete
then write. This makes the device tolerant to being powered down before
all
the data has been committed to the device. Don’t remember the
specifics.
But you might check it out.

Hope it helps.

Kevin


“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because
    of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and
write
over old binaries any way). Also, notice that flash gets corrupted
when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this
point,
but the general consensus is that these devices “work just fine”
under
Windows CE, DOS, and other OS’s, and “experience corruption” when
used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK







\

[If replying via email, you’ll need to click on the URL that’s emailed to
you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector > http://www.parse.com/~pdp8/

Robert Muil <r.muil@crcmining.com.au> wrote:

Robert,

Not sure if you are still interested in this issue - I hope so!

I have had big problems with flash corruption under QNX. We use CompactFlash
1Gb cards, with QNX 6.2.1A.

We seem to be severely exercising the problem now with a recurrent rebooting
problem. Because of a separate problem, our system reboots every hour. After
about 4 days the flash disk on which the OS resides becomes heavily
corrupted.

This has happened to at least 2 disks. The patterns of bad blocks are
interesting: in both cases, the bad blocks occupied exactly 3.2% of the disk
and were almost entirely contiguous. They were in different locations
though.

I have been racking my brain about what in the OS could possibly be writing
heavily to the disk during normal operation. There really should be very
little activity on the disk beyond reading it at bootup. However, your
suggestions that it is to do with unordered shutdown would make much sense
in my experience.

Flash Disk: Sandisk 1GB CompactFlash
I/F: IDE (treat as standard BIOS hard drive)
OS: QNX 6.2.1A

Still looking into it, of course. I’ve raised the spectre of flash problems
with management and they said “That’s nice”. Of course, now that we’re getting
closer to production, we’re also finding corruption :frowning: The more things change,
the more they stay the same. We haven’t extensively analyzed the nature of
the corruption, but it’s nasty; we have “block already in use by another file”
type of stuff all of the disk. So I reformatted all the disks (32 MB compact flash)
and started again, this time hoping to keep a better eye on things, but of course
they take the boxes out to the field for testing and I don’t know what they’ve
done. Sigh.

I’ll let you know what happens as this progresses :slight_smile:

Cheers,
-RK

Regards,
Robert Muil.

“Robert Krten” <> rk@parse.com> > wrote in message
news:c7d9l7$o4u$> 1@inn.qnx.com> …
Donald backstrom <> donaldb@cstgroup.com.au> > wrote:
I’m running QNX4 on a network of 1996 vintage boards which were recently
upgraded to IDE flash modules (the ones that go into the sockets
directly,
but look to the software like IDE drives). The original set of 18 was
from
ICP but I sent these back when we had three failures. Theu were replaced
by others also from ICP, but they boot as Sandisk SDC1-32. They are the
44-pin version (ie 2mm pitch connector).
These ones have not failed, but they do suffer data corruption. It is
always related to power down, but I’m not certain that it relates
necessarily to whether the drive was written to since previously powered
up.
It is curious that we are using these modules since we could not make
Compact Flash work, even though we were using boards that allegedly
supported CF as an IDE drive. We could not load a QNX disk image onto
these drives and have them last reliably. The boards we were using for
this were current - PCM5823 3.5inch form factor.
We don’t have a good record with flash at all.
We originally use M-systems PC104 flash disks, and apart from a number of
drive failures over the years (probably 4 in 18 systems in 6 years) they
did not corrupt data. This tends to support the TFFS data integrity
theory.

Hi Donald,

thanks for the datapoint! I will be working on this issue some more this
week / next week. I did receive input from another user that there is
definitely some corruption related to powerdown, and that the last write
needs to take place something like 30 to 120 seconds before the flash
is powered down – this is related to “automatic” block shuffling done
by the flash’s built in controller.

More as I have it.

Cheers,
-RK

nntp.qnx.com wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has
never
given me any problems, but IDE flash devices always have, and plain
vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power
down
before giving it around 10 seconds, things would be corrupted. Also, we
got
into the habit of using QNX 4’s sync command, that helped as well. I’ve
never really come up with a solution with QNX6, except to use DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of
the
TrueFFS driver. They have apparently have algorithms to help insure
data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new data
has
been fully committed. Its a Write then Delete process instead of delete
then write. This makes the device tolerant to being powered down before
all
the data has been committed to the device. Don’t remember the
specifics.
But you might check it out.

Hope it helps.

Kevin


“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because
    of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and
write
over old binaries any way). Also, notice that flash gets corrupted
when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this
point,
but the general consensus is that these devices “work just fine”
under
Windows CE, DOS, and other OS’s, and “experience corruption” when
used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK







\

[If replying via email, you’ll need to click on the URL that’s emailed to
you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector > http://www.parse.com/~pdp8/


[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

“Robert Muil” <r.muil@crcmining.com.au> wrote in message
news:cl7sad$ds5$1@inn.qnx.com

Robert,

Not sure if you are still interested in this issue - I hope so!

I have had big problems with flash corruption under QNX. We use
CompactFlash 1Gb cards, with QNX 6.2.1A.

We seem to be severely exercising the problem now with a recurrent
rebooting problem. Because of a separate problem, our system reboots every
hour. After about 4 days the flash disk on which the OS resides becomes
heavily corrupted.

This has happened to at least 2 disks. The patterns of bad blocks are
interesting: in both cases, the bad blocks occupied exactly 3.2% of the
disk and were almost entirely contiguous. They were in different locations
though.

Location reported by QNX utility or by Flash utilty (under different OS).
Compact flash presents to the OS virtual block. Hence the block may appear
to be in different locations but could be physicaly at the same location or
at totaly different location all together.

I have been racking my brain about what in the OS could possibly be
writing heavily to the disk during normal operation. There really should
be very little activity on the disk beyond reading it at bootup.

For every read I beleive the filesystem needs to be modify to update the
access time of the file. On flash it’s usually best to turn this off i
think. On harddisk this is usally not a problem but the same block gets
rewritten with the new data, but on flash can’t do that…

However, your suggestions that it is to do with unordered shutdown would
make much sense in my experience.

Flash Disk: Sandisk 1GB CompactFlash
I/F: IDE (treat as standard BIOS hard drive)
OS: QNX 6.2.1A

Regards,
Robert Muil.

“Robert Krten” <> rk@parse.com> > wrote in message
news:c7d9l7$o4u$> 1@inn.qnx.com> …
Donald backstrom <> donaldb@cstgroup.com.au> > wrote:
I’m running QNX4 on a network of 1996 vintage boards which were recently
upgraded to IDE flash modules (the ones that go into the sockets
directly,
but look to the software like IDE drives). The original set of 18 was
from
ICP but I sent these back when we had three failures. Theu were replaced
by others also from ICP, but they boot as Sandisk SDC1-32. They are the
44-pin version (ie 2mm pitch connector).
These ones have not failed, but they do suffer data corruption. It is
always related to power down, but I’m not certain that it relates
necessarily to whether the drive was written to since previously powered
up.
It is curious that we are using these modules since we could not make
Compact Flash work, even though we were using boards that allegedly
supported CF as an IDE drive. We could not load a QNX disk image onto
these drives and have them last reliably. The boards we were using for
this were current - PCM5823 3.5inch form factor.
We don’t have a good record with flash at all.
We originally use M-systems PC104 flash disks, and apart from a number
of
drive failures over the years (probably 4 in 18 systems in 6 years) they
did not corrupt data. This tends to support the TFFS data integrity
theory.

Hi Donald,

thanks for the datapoint! I will be working on this issue some more this
week / next week. I did receive input from another user that there is
definitely some corruption related to powerdown, and that the last write
needs to take place something like 30 to 120 seconds before the flash
is powered down – this is related to “automatic” block shuffling done
by the flash’s built in controller.

More as I have it.

Cheers,
-RK

nntp.qnx.com wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has
never
given me any problems, but IDE flash devices always have, and plain
vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power
down
before giving it around 10 seconds, things would be corrupted. Also,
we got
into the habit of using QNX 4’s sync command, that helped as well.
I’ve
never really come up with a solution with QNX6, except to use
DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of
the
TrueFFS driver. They have apparently have algorithms to help insure
data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new
data has
been fully committed. Its a Write then Delete process instead of
delete
then write. This makes the device tolerant to being powered down
before all
the data has been committed to the device. Don’t remember the
specifics.
But you might check it out.

Hope it helps.

Kevin


“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for
now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because
    of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try
to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and
write
over old binaries any way). Also, notice that flash gets corrupted
when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of
three
distinct customers; I don’t have very many details right at this
point,
but the general consensus is that these devices “work just fine”
under
Windows CE, DOS, and other OS’s, and “experience corruption” when
used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on
flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize
the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK







\

[If replying via email, you’ll need to click on the URL that’s emailed to
you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector > http://www.parse.com/~pdp8/

http://www.paschiche.com/forum/viewtopic.php?t=32

For me, it’s obviously the CF card which is DEAD ! The constant 3.2% of
corruption togheter which the continuous bas cluster spread in several
locations indicates that a part of the memory DIE has… DIED ! Like if
several continuous memory cells from continous lines (matrix of cells) are
burnt. For me, a square of the memory die is dead, or an imperfection was
introduced in the silicon and reveals in some conditions (temperature,
voltage, …)

Kochise


Robert Muil wrote:

Robert,

Not sure if you are still interested in this issue - I hope so!

I have had big problems with flash corruption under QNX. We use CompactFlash
1Gb cards, with QNX 6.2.1A.

We seem to be severely exercising the problem now with a recurrent rebooting
problem. Because of a separate problem, our system reboots every hour. After
about 4 days the flash disk on which the OS resides becomes heavily
corrupted.

This has happened to at least 2 disks. The patterns of bad blocks are
interesting: in both cases, the bad blocks occupied exactly 3.2% of the disk
and were almost entirely contiguous. They were in different locations
though.

I have been racking my brain about what in the OS could possibly be writing
heavily to the disk during normal operation. There really should be very
little activity on the disk beyond reading it at bootup. However, your
suggestions that it is to do with unordered shutdown would make much sense
in my experience.

Flash Disk: Sandisk 1GB CompactFlash
I/F: IDE (treat as standard BIOS hard drive)
OS: QNX 6.2.1A

Regards,
Robert Muil.

“Robert Krten” <> rk@parse.com> > wrote in message
news:c7d9l7$o4u$> 1@inn.qnx.com> …
Donald backstrom <> donaldb@cstgroup.com.au> > wrote:
I’m running QNX4 on a network of 1996 vintage boards which were recently
upgraded to IDE flash modules (the ones that go into the sockets
directly,
but look to the software like IDE drives). The original set of 18 was
from
ICP but I sent these back when we had three failures. Theu were replaced
by others also from ICP, but they boot as Sandisk SDC1-32. They are the
44-pin version (ie 2mm pitch connector).
These ones have not failed, but they do suffer data corruption. It is
always related to power down, but I’m not certain that it relates
necessarily to whether the drive was written to since previously powered
up.
It is curious that we are using these modules since we could not make
Compact Flash work, even though we were using boards that allegedly
supported CF as an IDE drive. We could not load a QNX disk image onto
these drives and have them last reliably. The boards we were using for
this were current - PCM5823 3.5inch form factor.
We don’t have a good record with flash at all.
We originally use M-systems PC104 flash disks, and apart from a number of
drive failures over the years (probably 4 in 18 systems in 6 years) they
did not corrupt data. This tends to support the TFFS data integrity
theory.

Hi Donald,

thanks for the datapoint! I will be working on this issue some more this
week / next week. I did receive input from another user that there is
definitely some corruption related to powerdown, and that the last write
needs to take place something like 30 to 120 seconds before the flash
is powered down – this is related to “automatic” block shuffling done
by the flash’s built in controller.

More as I have it.

Cheers,
-RK

nntp.qnx.com wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip has
never
given me any problems, but IDE flash devices always have, and plain
vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power
down
before giving it around 10 seconds, things would be corrupted. Also, we
got
into the habit of using QNX 4’s sync command, that helped as well. I’ve
never really come up with a solution with QNX6, except to use DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because of
the
TrueFFS driver. They have apparently have algorithms to help insure
data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new data
has
been fully committed. Its a Write then Delete process instead of delete
then write. This makes the device tolerant to being powered down before
all
the data has been committed to the device. Don’t remember the
specifics.
But you might check it out.

Hope it helps.

Kevin


“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems because
    of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the flash
memory gets corrupted regardless of media and hardware. You can try to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and
write
over old binaries any way). Also, notice that flash gets corrupted
when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of three
distinct customers; I don’t have very many details right at this
point,
but the general consensus is that these devices “work just fine”
under
Windows CE, DOS, and other OS’s, and “experience corruption” when
used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize the
results and analysis as much as I’m able to when given possible NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK







\

[If replying via email, you’ll need to click on the URL that’s emailed to
you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector > http://www.parse.com/~pdp8/

Kochise,

That was exactly my assumption, but it is strange that it seems to be
reproducible with all CFs, and on cards that previously tested fine. Also,
the cluster of bad blocks is not always in the same location on disk.

Robert.

“Kochise” <kochise@caramail.com> wrote in message
news:clr4tg$9jt$1@inn.qnx.com

http://www.paschiche.com/forum/viewtopic.php?t=32

For me, it’s obviously the CF card which is DEAD ! The constant 3.2% of
corruption togheter which the continuous bas cluster spread in several
locations indicates that a part of the memory DIE has… DIED ! Like if
several continuous memory cells from continous lines (matrix of cells) are
burnt. For me, a square of the memory die is dead, or an imperfection was
introduced in the silicon and reveals in some conditions (temperature,
voltage, …)

Kochise


Robert Muil wrote:

Robert,

Not sure if you are still interested in this issue - I hope so!

I have had big problems with flash corruption under QNX. We use
CompactFlash
1Gb cards, with QNX 6.2.1A.

We seem to be severely exercising the problem now with a recurrent
rebooting
problem. Because of a separate problem, our system reboots every hour.
After
about 4 days the flash disk on which the OS resides becomes heavily
corrupted.

This has happened to at least 2 disks. The patterns of bad blocks are
interesting: in both cases, the bad blocks occupied exactly 3.2% of the
disk
and were almost entirely contiguous. They were in different locations
though.

I have been racking my brain about what in the OS could possibly be
writing
heavily to the disk during normal operation. There really should be very
little activity on the disk beyond reading it at bootup. However, your
suggestions that it is to do with unordered shutdown would make much
sense
in my experience.

Flash Disk: Sandisk 1GB CompactFlash
I/F: IDE (treat as standard BIOS hard drive)
OS: QNX 6.2.1A

Regards,
Robert Muil.

“Robert Krten” <> rk@parse.com> > wrote in message
news:c7d9l7$o4u$> 1@inn.qnx.com> …
Donald backstrom <> donaldb@cstgroup.com.au> > wrote:
I’m running QNX4 on a network of 1996 vintage boards which were
recently
upgraded to IDE flash modules (the ones that go into the sockets
directly,
but look to the software like IDE drives). The original set of 18 was
from
ICP but I sent these back when we had three failures. Theu were
replaced
by others also from ICP, but they boot as Sandisk SDC1-32. They are
the
44-pin version (ie 2mm pitch connector).
These ones have not failed, but they do suffer data corruption. It is
always related to power down, but I’m not certain that it relates
necessarily to whether the drive was written to since previously
powered
up.
It is curious that we are using these modules since we could not make
Compact Flash work, even though we were using boards that allegedly
supported CF as an IDE drive. We could not load a QNX disk image onto
these drives and have them last reliably. The boards we were using for
this were current - PCM5823 3.5inch form factor.
We don’t have a good record with flash at all.
We originally use M-systems PC104 flash disks, and apart from a number
of
drive failures over the years (probably 4 in 18 systems in 6 years)
they
did not corrupt data. This tends to support the TFFS data integrity
theory.

Hi Donald,

thanks for the datapoint! I will be working on this issue some more
this
week / next week. I did receive input from another user that there is
definitely some corruption related to powerdown, and that the last
write
needs to take place something like 30 to 120 seconds before the flash
is powered down – this is related to “automatic” block shuffling done
by the flash’s built in controller.

More as I have it.

Cheers,
-RK

nntp.qnx.com wrote:

Hey Robert,

I sing the same tune as Miguel, with the exception that DiskOnChip
has
never
given me any problems, but IDE flash devices always have, and plain
vanilla
flash devices always have as well.

Basically, if we would overwrite anything on the flash disk and power
down
before giving it around 10 seconds, things would be corrupted. Also,
we
got
into the habit of using QNX 4’s sync command, that helped as well.
I’ve
never really come up with a solution with QNX6, except to use
DiskOnChip
devices only.

I tend to trust DOC much more than just plain vanilla flash because
of
the
TrueFFS driver. They have apparently have algorithms to help insure
data
integrity. Also, the DOC devices have functionality (or is it in the
TrueFFS driver? I don’t remember) that keep old data until the new
data
has
been fully committed. Its a Write then Delete process instead of
delete
then write. This makes the device tolerant to being powered down
before
all
the data has been committed to the device. Don’t remember the
specifics.
But you might check it out.

Hope it helps.

Kevin


“Miguel Simon” <> simon@ou.edu> > wrote in message
news:c5slto$ol1$> 1@inn.qnx.com> …
Hi RK…

I hope that all is well.

I will get back to you with more details at a later time. But for
now,
here is a way in which I have been able to corrupt flash disk in a
consistent manner:

I. subject: SanDisk CompactFlash 64Mb and 120 Mb
OS: QNX 6.2.1-B PE
hardware: VMIC cPCI, 933 MHz

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets corrupted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

II. subject: DiskOnChip 2000
OS: QNX 6.2.0
hardware: Adastra EBX board (do not recall actual board
number)

  1. can boot QNX ok,
  2. DiskOnChip gets corrupted at random, no pattern perceived…
  3. time of last incidence: 1 year ago
  4. NOTE: we changed to Prometheus PC104 from Diamond Systems
    because
    of
    this problem. Things are a little better, but…

III. subject: Prometheus Flash Disk Module
OS: QNX 6.2.1-B PE
hardware: Prometheus PC104

  1. can boot QNX ok,
  2. copy over the old binary without deleting it first
  3. flash memory gets filled and reports files that have been
    deleted
  4. must reformat the memory after a while
  5. mean time between failure: about a year or so, depending on how
    often
    you write to the flash
  6. time of last incident: 1 month ago

It seems that with QNX OS, when we write over old binaries, the
flash
memory gets corrupted regardless of media and hardware. You can try
to
do this for yourself and see if you get the same results. However,
notice that if I delete the old binaries first, it seems that I can
delay the onset of flash corruption (until I forget to do this and
write
over old binaries any way). Also, notice that flash gets corrupted
when
I write to the flash disk repeatedly over and over. Finally, the
DiskOnChip + Adastra board were a real bad match, but newer Adastra
boards may be ok. Also, I do not ever use MS embedded products.

As time passes I will collect better data, and I will let you know
(provided that you still need the information).

Regards…

Miguel.



Robert Krten wrote:
I’m about to start investigating flash corruption on behalf of
three
distinct customers; I don’t have very many details right at this
point,
but the general consensus is that these devices “work just fine”
under
Windows CE, DOS, and other OS’s, and “experience corruption” when
used
with QNX 6. That’s all the “hard facts” I have at this point.

The purpose of this post is to solicit input from the field on
flash
corruption – I’m looking for things like model numbers, flash
technology
used, usage patterns when it failed, and whether this is a QNX
6-specific
problem or not (as far as you are able to tell). I’ll summarize
the
results and analysis as much as I’m able to when given possible
NDA
constraints etc.

Thanks in advance for your input!

Cheers,
-RK







\

[If replying via email, you’ll need to click on the URL that’s emailed
to
you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector > http://www.parse.com/~pdp8/

\