How do I do a memory test in QNX?

One of the requirements for a project I am working on is to do a ram and rom
check during the self test portion of the code. The “rom” test will be a
checksum of the program file on the flash disk, but how do I do a RAM test
in QNX?

I am assuming I would need to:

map to sections of physical memory, test them (non-destructively), release
them, and move on.

Special case when it comes to the ram the program is running in, or where
it’s stack is.
Detect somehow that we are about to test a section of memory we are using,
and run a second test routine (located in another section of memory), then
continue.

Is this even possible in QNX?

How do I tell where my code and data is?

Somebody has to have done this before…
It looks like it would be simpler without an operating system, or memory
management to worry about. Is it possible to get to that state using an
interrupt handler, then make sure everything goes back as it was before the
return.

Thanks,

John Eddy

You could do this in the startup or the IPL before you give control over
to procnto. Then you can do anything. :slight_smile:

chris

John Eddy <john.h.eddy@lmco.com> wrote:

One of the requirements for a project I am working on is to do a ram and rom
check during the self test portion of the code. The “rom” test will be a
checksum of the program file on the flash disk, but how do I do a RAM test
in QNX?

I am assuming I would need to:

map to sections of physical memory, test them (non-destructively), release
them, and move on.

Special case when it comes to the ram the program is running in, or where
it’s stack is.
Detect somehow that we are about to test a section of memory we are using,
and run a second test routine (located in another section of memory), then
continue.

Is this even possible in QNX?

How do I tell where my code and data is?

Somebody has to have done this before…
It looks like it would be simpler without an operating system, or memory
management to worry about. Is it possible to get to that state using an
interrupt handler, then make sure everything goes back as it was before the
return.

Thanks,

John Eddy
\


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Sorry, but I need to be able to run a self test on demand from a menu that I
am controlling over a serial port using QNX and C code.

Silly requirement, because if there was a ram problem, the system would
already be corrupted at this point, but I have to do it anyway.

Thanks

“Chris McKillop” <cdm@qnx.com> wrote in message
news:bb32m9$3ff$3@nntp.qnx.com

You could do this in the startup or the IPL before you give control over
to procnto. Then you can do anything. > :slight_smile:

chris

John Eddy <> john.h.eddy@lmco.com> > wrote:
One of the requirements for a project I am working on is to do a ram and
rom
check during the self test portion of the code. The “rom” test will be a
checksum of the program file on the flash disk, but how do I do a RAM
test
in QNX?

I am assuming I would need to:

map to sections of physical memory, test them (non-destructively),
release
them, and move on.

Special case when it comes to the ram the program is running in, or
where
it’s stack is.
Detect somehow that we are about to test a section of memory we are
using,
and run a second test routine (located in another section of memory),
then
continue.

Is this even possible in QNX?

How do I tell where my code and data is?

Somebody has to have done this before…
It looks like it would be simpler without an operating system, or memory
management to worry about. Is it possible to get to that state using an
interrupt handler, then make sure everything goes back as it was before
the
return.

Thanks,

John Eddy



\

Chris McKillop <> cdm@qnx.com> > “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

John Eddy <john.h.eddy@lmco.com> wrote in message
news:bb32v7$7tp$1@inn.qnx.com

Sorry, but I need to be able to run a self test on demand from a menu that
I
am controlling over a serial port using QNX and C code.

Silly requirement, because if there was a ram problem, the system would
already be corrupted at this point, but I have to do it anyway.

More to the point, what does ‘failing RAM’ look like?

-Adam

I was just planning to:
read a block, store it,
write a pattern to that block.
read back the pattern to verify it wrote ok.
write a second pattern to the block.
read it back and verify.
write the original data back, and move on to next block.

“Adam Mallory” <amallory@qnx.com> wrote in message
news:bb396p$arn$1@nntp.qnx.com

John Eddy <> john.h.eddy@lmco.com> > wrote in message
news:bb32v7$7tp$> 1@inn.qnx.com> …
Sorry, but I need to be able to run a self test on demand from a menu
that
I
am controlling over a serial port using QNX and C code.

Silly requirement, because if there was a ram problem, the system would
already be corrupted at this point, but I have to do it anyway.

More to the point, what does ‘failing RAM’ look like?

-Adam

“John Eddy” <john.h.eddy@lmco.com> wrote in message
news:bb3cnj$ii4$1@inn.qnx.com

I was just planning to:
read a block, store it,
write a pattern to that block.
read back the pattern to verify it wrote ok.
write a second pattern to the block.
read it back and verify.
write the original data back, and move on to next block.

If they want you to run memory test (live) I would hope they at least ask
for something a little more sophisticated test. Detecting bad ram isn’t
that simple. Ram will often fail when specific pattern are written. Doing
a ram test the way you want to do it would not give me any sort of
confidence over the status of the RAM.

If something like that was asked of me I would use ECC memory and make sure
motherboard will signal ECC memory error via NMI and setup a special ISR
handler for that.

Doing memory test under QNX (or any multitasking os ) is more complex then
what you expect aside from not overwritting your own program space you have
to make sure you don’t write into any of the x86 special table that are use
to define/control virtual addressing.

One way would probably be to kill all the process possible, then allocate
all the memory you can (in small 1k block) and test that. It wouldn’t test
all RAM but you could test most of it and still be OS friendly.

“Adam Mallory” <> amallory@qnx.com> > wrote in message
news:bb396p$arn$> 1@nntp.qnx.com> …
John Eddy <> john.h.eddy@lmco.com> > wrote in message
news:bb32v7$7tp$> 1@inn.qnx.com> …
Sorry, but I need to be able to run a self test on demand from a menu
that
I
am controlling over a serial port using QNX and C code.

Silly requirement, because if there was a ram problem, the system
would
already be corrupted at this point, but I have to do it anyway.

More to the point, what does ‘failing RAM’ look like?

-Adam
\

Mario Charest postmaster@127.0.0.1 wrote in message
news:bb3mdn$scl$1@inn.qnx.com

If something like that was asked of me I would use ECC memory and make
sure
motherboard will signal ECC memory error via NMI and setup a special ISR
handler for that.

I agree. Some type of hardware sanity check would be much more reliable
(and meaningful).

Doing memory test under QNX (or any multitasking os ) is more complex then
what you expect aside from not overwritting your own program space you
have
to make sure you don’t write into any of the x86 special table that are
use
to define/control virtual addressing.

Not to mention, the proposed test was destructive - regardless of if you
restore what was there. Writing a ‘pattern’ into an area of code is going
to cause havoc since for any process (proc/kernel or not).

One way would probably be to kill all the process possible, then allocate
all the memory you can (in small 1k block) and test that. It wouldn’t
test
all RAM but you could test most of it and still be OS friendly.

The other issue is that this check scales linearly to the amount of RAM on
board. It’s not going to be very efficient to go through 256-512meg (minus
some) reading/writing/reading/writing each section (read old/write
new/verify new/write back old).

I think it would be wise to reevaluate the requirement and get a better
understanding at what you’re trying to accomplish - might be a better way to
skin the cat.

-Adam

"A

Not to mention, the proposed test was destructive - regardless of if you
restore what was there. Writing a ‘pattern’ into an area of code is going
to cause havoc since for any process (proc/kernel or not).

Not if interrupt are disabled during testing. This is’t real-time friendly
:wink:)

Mario Charest postmaster@127.0.0.1 wrote in message
news:bb54he$jsg$1@inn.qnx.com

Not to mention, the proposed test was destructive - regardless of if you
restore what was there. Writing a ‘pattern’ into an area of code is
going
to cause havoc since for any process (proc/kernel or not).

Not if interrupt are disabled during testing. This is’t real-time
friendly
:wink:> )

That would be quite difficult to do, since you can’t map much with
interrupts off. I suppose you could map everything first, then turn off
interrupts and go ‘test’, but that will be a very long time to have a
completely non responsive computer. I doubt that would be any more
acceptable. And as an added bonus you might not be able to tell if you
tripped over bad ram and froze, or had a bug and spent your time spinning
somewhere with interrupts off, or are just taking a lot of time hitting all
the mapped RAM.

-Adam

Adam and Mario,

1 The box only has 32Mb of ram, so it won’t take forever.
2 I have as much as 5 min to run the self tests. This will be the longest
part of them.
3 The box is expected to be non-responsive during this test. I already
figure interrupts would need to be off.
4 Map, test, map is what I had envisoned, but mapping larger chunks would
shorten the time.

5 The idea of killing everything possible off, and then allocating memory
and testing it sounds interesting, but we would have to ensure that the
program we wanted to run would run in the area we had just tested. Possibly
using a pair of these routines could test most of ram, but what about kernel
memory, shared objects, drivers… that can’t be stopped.

Is there a way to shut down these kernel and driver related memory users,
and relocate them into memory that has been tested, then test the memory
they were using??

We are also exploring the possiblity of using the bios ram check, but we
would have to reboot to use that.

Thanks for the suggestions,

John Eddy


“Adam Mallory” <amallory@qnx.com> wrote in message
news:bb59no$n8v$1@nntp.qnx.com

Mario Charest postmaster@127.0.0.1 wrote in message
news:bb54he$jsg$> 1@inn.qnx.com> …

Not to mention, the proposed test was destructive - regardless of if
you
restore what was there. Writing a ‘pattern’ into an area of code is
going
to cause havoc since for any process (proc/kernel or not).

Not if interrupt are disabled during testing. This is’t real-time
friendly
:wink:> )

That would be quite difficult to do, since you can’t map much with
interrupts off. I suppose you could map everything first, then turn off
interrupts and go ‘test’, but that will be a very long time to have a
completely non responsive computer. I doubt that would be any more
acceptable. And as an added bonus you might not be able to tell if you
tripped over bad ram and froze, or had a bug and spent your time spinning
somewhere with interrupts off, or are just taking a lot of time hitting
all
the mapped RAM.

-Adam

“John Eddy” <john.h.eddy@lmco.com> wrote in message
news:bb5pip$d8h$1@inn.qnx.com

Adam and Mario,

1 The box only has 32Mb of ram, so it won’t take forever.
2 I have as much as 5 min to run the self tests. This will be the longest
part of them.
3 The box is expected to be non-responsive during this test. I already
figure interrupts would need to be off.
4 Map, test, map is what I had envisoned, but mapping larger chunks would
shorten the time.

5 The idea of killing everything possible off, and then allocating memory
and testing it sounds interesting, but we would have to ensure that the
program we wanted to run would run in the area we had just tested.
Possibly
using a pair of these routines could test most of ram, but what about
kernel
memory, shared objects, drivers… that can’t be stopped.

You can stop most of them, expect flash driver and procnto :wink:

Is there a way to shut down these kernel and driver related memory users,
and relocate them into memory that has been tested, then test the memory
they were using??

you could kill everthing but procnto and flash-driver which doesn’t use much
ram. And since this is code that is run quite often, if the ram would be bad
at the kernel location I think you wouldn’t be able to get to the ram test
:wink:

We are also exploring the possiblity of using the bios ram check, but we
would have to reboot to use that.

Usually bios ram check sucks, i’ve never seen a motherboard that detect bad
ram though BIOS. Testing ram takes a long time and usually BIOS aren’t
doing REAL ram test.

If you are prepare to reboot, Adam’s suggestion to do the test in the IPL
would give you lots of flexibility (although you would be running in 16bit
real mode). Maybe there is a way to do it in the startup.

Or create a second partition that you could boot from and install MemTest86
(doesn’t require an OS)! http://www.memtest86.com/

John Eddy <john.h.eddy@lmco.com> wrote in message
news:bb5pip$d8h$1@inn.qnx.com

1 The box only has 32Mb of ram, so it won’t take forever.
2 I have as much as 5 min to run the self tests. This will be the longest
part of them.
3 The box is expected to be non-responsive during this test. I already
figure interrupts would need to be off.
4 Map, test, map is what I had envisoned, but mapping larger chunks would
shorten the time.

5 The idea of killing everything possible off, and then allocating memory
and testing it sounds interesting, but we would have to ensure that the
program we wanted to run would run in the area we had just tested.
Possibly
using a pair of these routines could test most of ram, but what about
kernel
memory, shared objects, drivers… that can’t be stopped.

Is there a way to shut down these kernel and driver related memory users,
and relocate them into memory that has been tested, then test the memory
they were using??

Relocating the kernel at run time isn’t possible.

Be very careful how you write your testing program. Make sure that when you
write your pattern, nothing you reference (instruction, vars, string
constants etc) is outside of the 4K page area you’re executing in.
Otherwise the access could be subject to translation, which could involve
a pagetable, which, according to Murphy, will reference the entry into the
4k page you just wrote into.

We are also exploring the possiblity of using the bios ram check, but we
would have to reboot to use that.

Well if you’re going to do that, might at well as write the RAM checker into
the startup code - that way it’s exactly what you want; either way it’s a
reboot.

-Adam

Adam Mallory <amallory@qnx.com> wrote in message
news:bb5usr$hpb$1@nntp.qnx.com

John Eddy <> john.h.eddy@lmco.com> > wrote in message
news:bb5pip$d8h$> 1@inn.qnx.com> …

1 The box only has 32Mb of ram, so it won’t take forever.
2 I have as much as 5 min to run the self tests. This will be the
longest
part of them.
3 The box is expected to be non-responsive during this test. I already
figure interrupts would need to be off.
4 Map, test, map is what I had envisoned, but mapping larger chunks
would
shorten the time.

5 The idea of killing everything possible off, and then allocating
memory
and testing it sounds interesting, but we would have to ensure that the
program we wanted to run would run in the area we had just tested.
Possibly
using a pair of these routines could test most of ram, but what about
kernel
memory, shared objects, drivers… that can’t be stopped.

Is there a way to shut down these kernel and driver related memory
users,
and relocate them into memory that has been tested, then test the memory
they were using??

Relocating the kernel at run time isn’t possible.

Be very careful how you write your testing program. Make sure that when
you
write your pattern, nothing you reference (instruction, vars, string
constants etc) is outside of the 4K page area you’re executing in.
Otherwise the access could be subject to translation, which could
involve
a pagetable, which, according to Murphy, will reference the entry into the
4k page you just wrote into.

We are also exploring the possiblity of using the bios ram check, but we
would have to reboot to use that.

Well if you’re going to do that, might at well as write the RAM checker
into
the startup code - that way it’s exactly what you want; either way it’s a
reboot.

Is this possiable ? (assume the 31M-32M is free for use)

mmap(1M - 32M) physical memory
interrupt disable

move 1M -2M to 31M
read/write/test 1M-2M region
restore 1M-2M
loop above 3 steps until all memory scaned.

interrupt enable

With Interrupt Disabled, there is nothing could happen
is it?

-xtang

Is this possiable ? (assume the 31M-32M is free for use)

mmap(1M - 32M) physical memory
interrupt disable

move 1M -2M to 31M
read/write/test 1M-2M region
restore 1M-2M
loop above 3 steps until all memory scaned.

interrupt enable

With Interrupt Disabled, there is nothing could happen
is it?

You could be writting into CPU translation table.

-xtang

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:bb7e81$9nh$1@inn.qnx.com

Is this possiable ? (assume the 31M-32M is free for use)

mmap(1M - 32M) physical memory
interrupt disable

move 1M -2M to 31M
read/write/test 1M-2M region
restore 1M-2M
loop above 3 steps until all memory scaned.

interrupt enable

With Interrupt Disabled, there is nothing could happen
is it?

You could be writting into CPU translation table.

Is there a way to know where all these “off limits” areas are?

John Eddy <john.h.eddy@lmco.com> wrote in message
news:bb7h1h$cis$1@inn.qnx.com

Is there a way to know where all these “off limits” areas are?

Not really, and nothing stops them from being in a different location later.

And I spoke too soon, the idea about running within a page isn’t viable.
Technically, you’ll need a TLB mapping for the page of code/data you’re
executing and the target page you’re ‘studying’ - but there are still
windows of opportunity where you touch the pages to get TLB entries and
turning interrupt off, that the TLB could get filled with more entries.

I think putting the check in startup or IPL is your only real viable
option - plus you get the added benefit that if you find any bad areas, you
can change the RAM layout to exclude those ranges and hopefully still have a
functional system.

-Adam

Adam,

What is “TLB” mapping?

Just how woud I go about changing the RAM layout if I found bad memory (this
sounds interesting)?

If it were possible to pre-test memory on bootup (reasonably quickly), and
re-map to exlcude bad areas, why not do this on every boot?

Thanks,

John Eddy



“Adam Mallory” <amallory@qnx.com> wrote in message
news:bb896f$btj$1@nntp.qnx.com

John Eddy <> john.h.eddy@lmco.com> > wrote in message
news:bb7h1h$cis$> 1@inn.qnx.com> …

Is there a way to know where all these “off limits” areas are?

Not really, and nothing stops them from being in a different location
later.

And I spoke too soon, the idea about running within a page isn’t viable.
Technically, you’ll need a TLB mapping for the page of code/data you’re
executing and the target page you’re ‘studying’ - but there are still
windows of opportunity where you touch the pages to get TLB entries and
turning interrupt off, that the TLB could get filled with more entries.

I think putting the check in startup or IPL is your only real viable
option - plus you get the added benefit that if you find any bad areas,
you
can change the RAM layout to exclude those ranges and hopefully still have
a
functional system.

-Adam

John Eddy <john.h.eddy@lmco.com> wrote in message
news:bbffti$ddh$1@inn.qnx.com

What is “TLB” mapping?

TLB = translation lookaside buffer

The x86 (and other architectures) use this cache of recently used
translations rather than going to the pagetables for each virtual address.
If there was a way to guarantee that the translations for the pages you are
going to use/target will always be in the TLB, then you can scribble over
page tables w/o much concern (since the TLB entry will save the hardware
from going to the table, and extracting errnous data).

Just how woud I go about changing the RAM layout if I found bad memory
(this
sounds interesting)?

The asinfo section of the system page desribes memory from which the OS can
allocate from, named ‘sysram’. If you found ‘bad ram’, you could modify the
ranges for this sysram to exclude the area.

If it were possible to pre-test memory on bootup (reasonably quickly), and
re-map to exlcude bad areas, why not do this on every boot?

Like I said before, what exactly does ‘bad ram’ look like? The cost of
scanning RAM at startup isn’t cheap, and the benefit of the scan is usually
not worth the cost. That said, the whole reason why startup code is given,
is so customers can customize to their hearts content.

-Adam

“Adam Mallory” <amallory@qnx.com> wrote in message
news:bbfml7$3bs$1@nntp.qnx.com

John Eddy <> john.h.eddy@lmco.com> > wrote in message
news:bbffti$ddh$> 1@inn.qnx.com> …

What is “TLB” mapping?

TLB = translation lookaside buffer

The x86 (and other architectures) use this cache of recently used
translations rather than going to the pagetables for each virtual address.
If there was a way to guarantee that the translations for the pages you
are
going to use/target will always be in the TLB, then you can scribble over
page tables w/o much concern (since the TLB entry will save the hardware
from going to the table, and extracting errnous data).

Just how woud I go about changing the RAM layout if I found bad memory
(this
sounds interesting)?

The asinfo section of the system page desribes memory from which the OS
can
allocate from, named ‘sysram’. If you found ‘bad ram’, you could modify
the
ranges for this sysram to exclude the area.

If it were possible to pre-test memory on bootup (reasonably quickly),
and
re-map to exlcude bad areas, why not do this on every boot?

For the same reason BIOSes aren’t doing real memory test :wink:

Like I said before, what exactly does ‘bad ram’ look like? The cost of
scanning RAM at startup isn’t cheap, and the benefit of the scan is
usually
not worth the cost. That said, the whole reason why startup code is
given,
is so customers can customize to their hearts content.

-Adam

Mario Charest postmaster@127.0.0.1 wrote:

MC > Or create a second partition that you could boot from and install MemTest86
MC > (doesn’t require an OS)! http://www.memtest86.com/

If you can create another aprtitionm, MicroScope 2000 makes a good PC
diagnostic. See http://www.micro2000.com/