Robert Muil <r.muil@crcmining.com.au> wrote:
Thank you both. I have reconsidered my design based on this.
But, any design with cancelling, or aborting, threads is going to
have some nasty code/cleanup to handle. Why do you need to do this?
Because of bad design on my part. I don’t really need to anymore, but I am
interested in it.
Surely an intuitive and useful solution in the OS would be to have mutexes
transfer ownership to the parent of a thread if it aborts, or even just
become uninitialised? There is no point at all in having a locked
initialised mutex owned by no thread.
As Igor has pointed out, there is no parent-child relationship for
threads. (Well, at the moment of creation there is, for inheritance
of attributes, beyond that point threads are peer-relationships, not
parent-child as with processes. For example any thread in a process
can pthread_join() any other thread to wait on its death.)
As to unitializing the mutex – much of the mutex code I’ve seen doesn’t
check the return on pthread_mutex_lock(), just assumes that when it
returns it is safe to enter the critical section. For an un-inititialized
mutex, this will result in an error return (not checked) and then any
number of threads entering that critical section. (Yes, that is bad
code… but it creates an in-obvious problem for the bad code, the mutex
staying locked on thread death at least creates a fairly obvious and
detectable bad situation.)
In real life threads sometimes die when you’re least expecting it. For
example if you got a SIGSEGV, it may be desirable to let the just the
offending thread die, instead of killing the entire process. If the
thread was one of ‘pool threads’ (i.e., transient) handling a user
request then killing the thread will affect only one user request,
rather than all of them. If the condition causing the SIGSEGV does
not happen often then such handling policy can save you a lot of
trouble by limiting the damage in the field while you’re
investigating.
I’ve dealt with designs where this might be needed. Also the need
to cancel some ongoing activity, where cancelling the thread was
the best way to go about it.
I didn’t reject totally cleaning up mutexes – I suggested a couple
ways to catch/cleanup (cancellation, the thread-local-storage
cleanup trick).
But, it is messy. In fact, the state of the mutex is sometimes far
less of a problem than verifying/checking the state of the data that
is being protected by the mutex. (e.g. half-completed insertion or
deletion from a linked list.)
FYI, Solaris has pthread_mutexattr_setrobust_np() and
pthread_mutex_consistent_np() to deal with this issue (the ‘np’
designating non-portable). Instead of thread death notifications they
have extra attribute and pthread_mutex_lock() has additonal logic. I
would say their approach is more generic and probably will make way
into POSIX some day.
One way to handle it. Might make it into POSIX.
-David
QNX Training Services
http://www.qnx.com/services/training/
Please followup in this newsgroup if you have further questions.