io-net crashes on shutdown of an up filter

Shaun_Jackman · June 5, 2002, 11:27pm

When user-space sends me a devctl message to shutdown the filter, I oblige
calling ion->dereg().

devctl()
calls ion->dereg()
shutdown1()
does nothing, returns EOK
shutdown2()
kills worker thread (signals it to exit, and calls pthread_join())
calls ion->free() for the recylced packet (single packet packet pool)
shutdown()
does nothing

All these functions get called in turn, and nothing in the process generates
an error. Everything seems to be successful and happy. Nonetheless, io-net
crashes. Here’s the backtrace:
(gdb) bt
#0 0x19 in ?? ()
#1 0xb821cbc6 in ?? ()
#2 0xb032a753 in _dlclose () from /lib/libc.so.2
#3 0xb032a7d4 in dclose () from /lib/libc.so.2
#4 0x8051433 in main ()
#5 0x804aaf6 in _btext ()
#6 0x8049d0f in _btext ()
#7 0xb0321cfb in _message_handler () from /lib/libc.so.2
#8 0xb0320e45 in dispatch_handler () from /lib/libc.so.2
#9 0xb03208e2 in _thread_pool_thread () from /lib/libc.so.2
(gdb) x/i $pc
0x19: Cannot access memory at address 0x19

Is there any other cleanup work to be done?

Thanks,
Shaun

Sean_Boudreau1 · June 6, 2002, 12:54pm

This is fixed in 6.2.

-seanb

Shaun Jackman <sjackman@nospam.vortek.com> wrote:
: When user-space sends me a devctl message to shutdown the filter, I oblige
: calling ion->dereg().

: devctl()
: calls ion->dereg()
: shutdown1()
: does nothing, returns EOK
: shutdown2()
: kills worker thread (signals it to exit, and calls pthread_join())
: calls ion->free() for the recylced packet (single packet packet pool)
: shutdown()
: does nothing

: All these functions get called in turn, and nothing in the process generates
: an error. Everything seems to be successful and happy. Nonetheless, io-net
: crashes. Here’s the backtrace:
: (gdb) bt
: #0 0x19 in ?? ()
: #1 0xb821cbc6 in ?? ()
: #2 0xb032a753 in _dlclose () from /lib/libc.so.2
: #3 0xb032a7d4 in dclose () from /lib/libc.so.2
: #4 0x8051433 in main ()
: #5 0x804aaf6 in _btext ()
: #6 0x8049d0f in _btext ()
: #7 0xb0321cfb in _message_handler () from /lib/libc.so.2
: #8 0xb0320e45 in dispatch_handler () from /lib/libc.so.2
: #9 0xb03208e2 in _thread_pool_thread () from /lib/libc.so.2
: (gdb) x/i $pc
: 0x19: Cannot access memory at address 0x19

: Is there any other cleanup work to be done?

: Thanks,
: Shaun

Shaun_Jackman · June 6, 2002, 4:17pm

Is there a bug report you can point me to? I’d like to know what’s causing
it. Is there a work around? Is a 6.2 patch available to owners of 6.1?

Thanks,
Shaun

Sean Boudreau <seanb@qnx.com> wrote in message
news:adnm1c$5c3$1@nntp.qnx.com…

This is fixed in 6.2.

-seanb

Sean_Boudreau1 · June 6, 2002, 4:29pm

It’s caused by the fact that you’ve initiated the shutdown yourself
by calling ion->dereg(). io-net does the dlclose(), then the stack unwinds
back to your code which has been unmapped as part of the dlclose().

The workaround is to use ‘umount’ to initiate the shutdown, rather
than a a devctl you’ve defined:

umount /dev/io-net/ip_ip0 <-(I’m guessing this is an ip filter.)

There is a non comercial download of 6.2 (Momentics) off our home page,
your sales rep can help you with other options.

-seanb

Shaun Jackman <sjackman@nospam.vortek.com> wrote:
: Is there a bug report you can point me to? I’d like to know what’s causing
: it. Is there a work around? Is a 6.2 patch available to owners of 6.1?

: Thanks,
: Shaun

: Sean Boudreau <seanb@qnx.com> wrote in message
: news:adnm1c$5c3$1@nntp.qnx.com…
:>
:> This is fixed in 6.2.
:>
:> -seanb

Shaun_Jackman · June 6, 2002, 10:05pm

I found that if I didn’t call dereg_byte_pattern() before shutting down,
io-net would crash. Can someone confirm this? What cleanup functions must be
called before shutting down a module?
malloc() memory? will this clean itself up?
file descriptors?
kill worker threads?
ion->dereg_byte_pattern()?
ion->free() for all allocated packets?
If this were a typical process, the majority of this stuff would clean
itself up upon shutdown. I don’t know how life inside a dynamically loaded
shared lib affects this paradigm. In particular, when my module is loaded
for a second time, will the data segment be preserved from the last run,
since io-net hasn’t been killed in between? Or will it be freshly loaded
from disk?

I ask because I’m still getting crashes when I umount my filter. Oddly
enough though, when I’m attached to io-net with gdb, it doesn’t crash when I
umount the filter the first time. I then remount my filter (which works),
and umount it again. This time it crashes! When I’m not watching it with
gdb, it crashes the first time.
The bactrace looks roughly like:
strcmp () from /libc/libc.so.2
??
??
_resmgr_mount_hanlder () from /libc/libc.so.2
_resmgr_connect_handler () from /libc/libc.so.2
_resmgr_msg_handler () from /libc/libc.so.2
_message_handler () from /libc/libc.so.2
dispatch_handler () from /libc/libc.so.2
_thread_pool_thread () from /libc/libc.so.2

Thanks,
Shaun

Sean_Boudreau1 · June 7, 2002, 1:29pm

I don’t see anything obvious that would explain this. The only
optional one in your list is ion->dereg_byte_pattern(): io-net
will clean these up for you but if you do it yourself, no harm
done. If I had to guess, I’d say heap corruption since
the byte_pat entries in io-net are malloc()'d.

-seanb

Shaun Jackman <sjackman@nospam.vortek.com> wrote:
: I found that if I didn’t call dereg_byte_pattern() before shutting down,
: io-net would crash. Can someone confirm this? What cleanup functions must be
: called before shutting down a module?
: malloc() memory? will this clean itself up?
: file descriptors?
: kill worker threads?
: ion->dereg_byte_pattern()?
: ion->free() for all allocated packets?
: If this were a typical process, the majority of this stuff would clean
: itself up upon shutdown. I don’t know how life inside a dynamically loaded
: shared lib affects this paradigm. In particular, when my module is loaded
: for a second time, will the data segment be preserved from the last run,
: since io-net hasn’t been killed in between? Or will it be freshly loaded
: from disk?

io-net’s state is maintained, your dll is freshly initialized, but its
heap is io-net’s.

: I ask because I’m still getting crashes when I umount my filter. Oddly
: enough though, when I’m attached to io-net with gdb, it doesn’t crash when I
: umount the filter the first time. I then remount my filter (which works),
: and umount it again. This time it crashes! When I’m not watching it with
: gdb, it crashes the first time.
: The bactrace looks roughly like:
: strcmp () from /libc/libc.so.2
: ??
: ??
: _resmgr_mount_hanlder () from /libc/libc.so.2
: _resmgr_connect_handler () from /libc/libc.so.2
: _resmgr_msg_handler () from /libc/libc.so.2
: _message_handler () from /libc/libc.so.2
: dispatch_handler () from /libc/libc.so.2
: _thread_pool_thread () from /libc/libc.so.2

: Thanks,
: Shaun

Shaun_Jackman · June 7, 2002, 11:15pm

I’ve managed to stop the crashes on umount of my module, although I’m still
not certain of the exact source.

shutdown1(), shutdown2, and shutdown() well all being called and returning
successfully. My crash was occuring some time after shutdown() returns. I
noticed that when I loaded up the io-net core file in gdb, my module
nfm-upfilter.so was not resident, but my utility library libutil.so was.
This leads me to think the crash is caused by the utility library, and not
my filter itself. I stripped out all calls to the utility library so that
the filter no longer depends on it. I can now umount and restart my filter
at will! The io-net related code is in C, but it calls some helper functions
in the utility library that in turn calls some C++ code.

A couple of questions now:
Is there any gotchas associated with an io-net module depending on a second
shared lib?
Is there any gotchas associated with using C++ in io-net filters?

Now a new problem. My filter is an upfilter (en_en). When I start my filter
ping works to the box as I hoped (big success! I didn’t break anything
yet!). Once I umount my filter though the box stops being pingable. In
addition ifconfig fails:
$ ifconfig en0
ifconfig: SIOCGIFFLAGS en0: No such device or address
However, all the expected entries in /dev/io-net are still there:
en0 ip0 ip_en qnet0 qnet_en

Have I somehow broken everything above me in shutting down my filter?

Thanks again. This newsgroup has been an amazing help (Sean in particular!)
Shaun

Shaun_Jackman · June 10, 2002, 10:18pm

I’ve solved this one!
The problem was in my call to ion->reg(). The registrant pointer I was
passing was to an io_net_registrant_t structure that was allocated on the
stack. This caused me no trouble at all until I attempted to shutdown. It
seems io-net requires the structure still exist at this time. The API
documentation doesn’t point this out:
reg()
registrant
A pointer to an io_net_registrant_t structure that describes what
your driver is registering as.
Can the API documentation be ammended to include the expected lifetime of
passed pointers? In general, I assume that any passed pointer only has to
exist for the lifetime of that one function call, unless the documentation
for that call says otherwise. If this isn’t the case and the function
expects the pointer to last longer, that should be mentioned in the API
documentation.

Cheers,
Shaun

Now a new problem. My filter is an upfilter (en_en). When I start my
filter
ping works to the box as I hoped (big success! I didn’t break anything
yet!). Once I umount my filter though the box stops being pingable. In
addition ifconfig fails:
[clip]