what's going on with unpluged board?

Hi everybody.
I have a driver which supports the cPCI board. It has shared memory
which
contains buffers, queues, etc. Sometimes the firmware on
this board is hanging without any symptoms. So we decided to unpower it
by
our hotswap controller and then power it back, initialize the
PCI configuration and then start using it again. (I don’t know why but
sometimes “reset” does not work. Looks like the firmware dies completely
and does not handle anything).

Driver has two threads: one is handling the “client requests” by
dispatch
library, and second handles events from interrupt handler[s]. (I’m not
using
INTR event because the driver supports many boards and i’m using event
to
specify which board).

But i have some strange situation here. When i powered back the board i
saw
that “interrupt handling” thread (not handler itself=) ) can’t get
events -
it gets “No such channel” error code (errno), and values of channel id
is
correct. Sometimes “communication/dispatch” thread starts handle same
event
as it got last time (forever) - this looks almost same if dispatch
library
does not detect “wrong channel number”. ( just for experts: i’m using
resmgr_block function and check for NULL, but it’s not).

May be i have a bug in interrupt handler - i check if the interrupt
source
was mine (IN THAT SHARED MEMORY), and if it was mine - i clear it). So,
may be this “clearing” is my problem. It device is unplugged (or
unpowered)
this physical memory is not available for cpu and interrupt handler does
not
check any boundaries and THE QUESTION: "is it possible when interrupt
handler clears the interrupt in shared memory and the memory is not
available to make such damage? If yes, why it breaks only my “channel
info” - not other processes =))) . How QNX/NTO handles dynamic physical
memory?

Thanks for every comments.


Andrey

“Andrey Andreev” <andreev3@home.com> wrote in message
news:3C04895A.D61A7D0@home.com

check any boundaries and THE QUESTION: "is it possible when interrupt
handler clears the interrupt in shared memory and the memory is not
available to make such damage? If yes, why it breaks only my “channel
info” - not other processes =))) . How QNX/NTO handles dynamic physical
memory?

I don’t think it does. I’d try to invalidate the mapped memory using msync()
after board is powered up back. Just a wild guess but it might be cached, so
you could try to set PROT_NOCACHE in mmap() and see what happens. I suppose
you do munmap() when powering board down, and mmap() when it comes back,
don’t you?

  • igor