Reclaiming a locked mutex

Robert_Muil · December 8, 2004, 12:37am

Hello,

Is there any way of releasing/reclaiming a mutex that has been locked by an
aborted thread?

pthread_mutex_unlock () will not work outside the locking thread.
pthread_mutex_destroy () will not destroy a locked mutex. pthread_mutex_init
() will not reinitialise an mutex that has not been destroyed.

Therefore, it seems to me that if a thread uses a mutex (which is very
common), it must never be aborted, or the mutex is forever locked. This
seems silly.

Robert.

Igor_Kovalenko2 · December 8, 2004, 8:18am

SyncMutexEvent() and SyncMutexRevive() kernel calls can do it.
But it will be ugly since they need object of type sync_t, rather than
pthread_mutex_t. Gotta peek into the headers and it won’t be portable (may
not even be compatible between QNX versions).

– igor

“Robert Muil” <r.muil@crcmining.com.au> wrote in message
news:cp5hsl$afd$1@inn.qnx.com…

Hello,

Is there any way of releasing/reclaiming a mutex that has been locked by
an
aborted thread?

pthread_mutex_unlock () will not work outside the locking thread.
pthread_mutex_destroy () will not destroy a locked mutex.
pthread_mutex_init
() will not reinitialise an mutex that has not been destroyed.

Therefore, it seems to me that if a thread uses a mutex (which is very
common), it must never be aborted, or the mutex is forever locked. This
seems silly.

Robert.

Peter_Weber1 · December 8, 2004, 12:33pm

May you have a chance to use pthread_cleanup_push/pop() to leave the mutex
in a defined state in case the thread gets killed. May a cleaner solution
cheers,
Peter

“Robert Muil” <r.muil@crcmining.com.au> schrieb im Newsbeitrag
news:cp5hsl$afd$1@inn.qnx.com…

Hello,

Is there any way of releasing/reclaiming a mutex that has been locked by
an
aborted thread?

pthread_mutex_unlock () will not work outside the locking thread.
pthread_mutex_destroy () will not destroy a locked mutex.
pthread_mutex_init
() will not reinitialise an mutex that has not been destroyed.

Therefore, it seems to me that if a thread uses a mutex (which is very
common), it must never be aborted, or the mutex is forever locked. This
seems silly.

Robert.

Kevin_N1 · December 8, 2004, 2:28pm

I’ve successfully used SyncMutexEvent() & SyncMutexRevive() to recover locked
mutexes. You can use them directly on objects of type pthread_mutex_t. Just
don’t forget to unlock the mutex after reviving it.

From /usr/include/pthread.h:

typedef sync_t pthread_mutex_t;

K.

Igor Kovalenko wrote:

SyncMutexEvent() and SyncMutexRevive() kernel calls can do it.
But it will be ugly since they need object of type sync_t, rather than
pthread_mutex_t. Gotta peek into the headers and it won’t be portable (may
not even be compatible between QNX versions).

– igor

“Robert Muil” <> r.muil@crcmining.com.au> > wrote in message
news:cp5hsl$afd$> 1@inn.qnx.com> …

Hello,

Is there any way of releasing/reclaiming a mutex that has been locked by

an

aborted thread?

pthread_mutex_unlock () will not work outside the locking thread.
pthread_mutex_destroy () will not destroy a locked mutex.

pthread_mutex_init

() will not reinitialise an mutex that has not been destroyed.

Therefore, it seems to me that if a thread uses a mutex (which is very
common), it must never be aborted, or the mutex is forever locked. This
seems silly.

Robert.

\

David_Gibbs1 · December 13, 2004, 4:00pm

Peter Weber <pweber@qnx.com> wrote:

May you have a chance to use pthread_cleanup_push/pop() to leave the mutex
in a defined state in case the thread gets killed. May a cleaner solution

These only work for a canceled thread (pthread_cancel()) rather than an
aborted thread.

-David

cheers,
Peter

“Robert Muil” <> r.muil@crcmining.com.au> > schrieb im Newsbeitrag
news:cp5hsl$afd$> 1@inn.qnx.com> …
Hello,

Is there any way of releasing/reclaiming a mutex that has been locked by
an
aborted thread?

pthread_mutex_unlock () will not work outside the locking thread.
pthread_mutex_destroy () will not destroy a locked mutex.
pthread_mutex_init
() will not reinitialise an mutex that has not been destroyed.

Therefore, it seems to me that if a thread uses a mutex (which is very
common), it must never be aborted, or the mutex is forever locked. This
seems silly.

Robert.

–
QNX Training Services
http://www.qnx.com/services/training/
Please followup in this newsgroup if you have further questions.

David_Gibbs1 · December 13, 2004, 4:05pm

Robert Muil <r.muil@crcmining.com.au> wrote:

Hello,

Is there any way of releasing/reclaiming a mutex that has been locked by an
aborted thread?

In general, no. As noted elsewhere, the SyncMutexRevive() kernel call
may work.

pthread_mutex_unlock () will not work outside the locking thread.
pthread_mutex_destroy () will not destroy a locked mutex. pthread_mutex_init
() will not reinitialise an mutex that has not been destroyed.

Therefore, it seems to me that if a thread uses a mutex (which is very
common), it must never be aborted, or the mutex is forever locked. This
seems silly.

Yup, you are not supposed to abort a thread where it could cause such
damage. In fact, that documentation for pthread_abort() explicitly state
that it won’t do so.

The proper way to handle this is to use pthread_cancel() to terminate
threads, and for the threads being cancelled to pthread_cancel_push()
and pthread_cancel_pop() to provide cancellation handlers that will
unlock mutexes.

Another thing that MIGHT work, would be to do a pthread_key_create(),
with a cleanup handler in there – I think that will get executed by
the thread on thread death even if aborted(), then the thread could
unlock any mutexes it has locked. Of course, the problem is to know
which mutexes are locked – and if unlocking, you have to leave the
data protected by the mutex consistent.

But, any design with cancelling, or aborting, threads is going to have
some nasty code/cleanup to handle. Why do you need to do this?

-David

QNX Training Services
http://www.qnx.com/services/training/
Please followup in this newsgroup if you have further questions.

Igor_Kovalenko2 · December 13, 2004, 11:58pm

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:cpkekn$t4d$2@nntp.qnx.com…

But, any design with cancelling, or aborting, threads is going to have
some nasty code/cleanup to handle. Why do you need to do this?

In real life threads sometimes die when you’re least expecting it. For
example if you got a SIGSEGV, it may be desirable to let the just the
offending thread die, instead of killing the entire process. If the thread
was one of ‘pool threads’ (i.e., transient) handling a user request then
killing the thread will affect only one user request, rather than all of
them. If the condition causing the SIGSEGV does not happen often then such
handling policy can save you a lot of trouble by limiting the damage in the
field while you’re investigating.

Btw, I am not speaking hypothetically. It has been experience with systems
deployed in far away places, for which a loss of service, even temporary
(reboot or software restart) outside of a designated maintenance window is a
major accident.

FYI, Solaris has pthread_mutexattr_setrobust_np() and
pthread_mutex_consistent_np() to deal with this issue (the ‘np’ designating
non-portable). Instead of thread death notifications they have extra
attribute and pthread_mutex_lock() has additonal logic. I would say their
approach is more generic and probably will make way into POSIX some day.

– igor

Robert_Muil · December 14, 2004, 12:59am

Thank you both. I have reconsidered my design based on this.

But, any design with cancelling, or aborting, threads is going to
have some nasty code/cleanup to handle. Why do you need to do this?

Because of bad design on my part. I don’t really need to anymore, but I am
interested in it.

Surely an intuitive and useful solution in the OS would be to have mutexes
transfer ownership to the parent of a thread if it aborts, or even just
become uninitialised? There is no point at all in having a locked
initialised mutex owned by no thread.

In real life threads sometimes die when you’re least expecting it. For
example if you got a SIGSEGV, it may be desirable to let the just the
offending thread die, instead of killing the entire process. If the
thread was one of ‘pool threads’ (i.e., transient) handling a user
request then killing the thread will affect only one user request,
rather than all of them. If the condition causing the SIGSEGV does
not happen often then such handling policy can save you a lot of
trouble by limiting the damage in the field while you’re
investigating.
Btw, I am not speaking hypothetically. It has been experience with
systems deployed in far away places, for which a loss of service,
even temporary (reboot or software restart) outside of a designated
maintenance window is a major accident.

FYI, Solaris has pthread_mutexattr_setrobust_np() and
pthread_mutex_consistent_np() to deal with this issue (the ‘np’
designating non-portable). Instead of thread death notifications they
have extra attribute and pthread_mutex_lock() has additonal logic. I
would say their approach is more generic and probably will make way
into POSIX some day.
– igor

Igor_Kovalenko2 · December 14, 2004, 7:49am

“Robert Muil” <r.muil@crcmining.com.au> wrote in message
news:cpldcb$83q$1@inn.qnx.com…

Thank you both. I have reconsidered my design based on this.

But, any design with cancelling, or aborting, threads is going to
have some nasty code/cleanup to handle. Why do you need to do this?

Because of bad design on my part. I don’t really need to anymore, but I am
interested in it.

Surely an intuitive and useful solution in the OS would be to have mutexes
transfer ownership to the parent of a thread if it aborts, or even just
become uninitialised? There is no point at all in having a locked
initialised mutex owned by no thread.

POSIX threads do not have parent-child relationship in the same sense as
processes do. You can create all threads from the main, or have each thread
create the next one, it does not matter. They are nothing more than threads
of execution within one process. Besides, a ‘parent’ thread might not be
prepared to deal with the mutex at all.

Who’s best prepared to deal with it, is a big question. Dealing with it
implies capability to make the locked data consistent. QNX solution prompts
you to put that intelligence into a ‘manager’ (or ‘dispatcher’) thread that
sits waiting for events (and probably dispatches ‘worker’ threads). Solaris
solution suggests it should be in the ‘workers’ (a mutex with ‘robust’
attribute can be pthread_mutex_lock()-ed even if current owner has died, but
you get a specific return code that must be always checked - then if you
can make the data consistent you mark the mutex as consistent).

I guess the reason why it is not in the POSIX is because there’s not much
mileage on any of the solutions. Not a lot of Unix applications actually use
threads heavily.

– igor

David_Gibbs1 · December 14, 2004, 5:02pm

Robert Muil <r.muil@crcmining.com.au> wrote:

Thank you both. I have reconsidered my design based on this.

But, any design with cancelling, or aborting, threads is going to
have some nasty code/cleanup to handle. Why do you need to do this?

Because of bad design on my part. I don’t really need to anymore, but I am
interested in it.

Surely an intuitive and useful solution in the OS would be to have mutexes
transfer ownership to the parent of a thread if it aborts, or even just
become uninitialised? There is no point at all in having a locked
initialised mutex owned by no thread.

As Igor has pointed out, there is no parent-child relationship for
threads. (Well, at the moment of creation there is, for inheritance
of attributes, beyond that point threads are peer-relationships, not
parent-child as with processes. For example any thread in a process
can pthread_join() any other thread to wait on its death.)

As to unitializing the mutex – much of the mutex code I’ve seen doesn’t
check the return on pthread_mutex_lock(), just assumes that when it
returns it is safe to enter the critical section. For an un-inititialized
mutex, this will result in an error return (not checked) and then any
number of threads entering that critical section. (Yes, that is bad
code… but it creates an in-obvious problem for the bad code, the mutex
staying locked on thread death at least creates a fairly obvious and
detectable bad situation.)

In real life threads sometimes die when you’re least expecting it. For
example if you got a SIGSEGV, it may be desirable to let the just the
offending thread die, instead of killing the entire process. If the
thread was one of ‘pool threads’ (i.e., transient) handling a user
request then killing the thread will affect only one user request,
rather than all of them. If the condition causing the SIGSEGV does
not happen often then such handling policy can save you a lot of
trouble by limiting the damage in the field while you’re
investigating.

I’ve dealt with designs where this might be needed. Also the need
to cancel some ongoing activity, where cancelling the thread was
the best way to go about it.

I didn’t reject totally cleaning up mutexes – I suggested a couple
ways to catch/cleanup (cancellation, the thread-local-storage
cleanup trick).

But, it is messy. In fact, the state of the mutex is sometimes far
less of a problem than verifying/checking the state of the data that
is being protected by the mutex. (e.g. half-completed insertion or
deletion from a linked list.)

FYI, Solaris has pthread_mutexattr_setrobust_np() and
pthread_mutex_consistent_np() to deal with this issue (the ‘np’
designating non-portable). Instead of thread death notifications they
have extra attribute and pthread_mutex_lock() has additonal logic. I
would say their approach is more generic and probably will make way
into POSIX some day.

One way to handle it. Might make it into POSIX.

-David

QNX Training Services
http://www.qnx.com/services/training/
Please followup in this newsgroup if you have further questions.

Robert_Muil · December 16, 2004, 9:01am

David,

I agree that it is a difficult situation to deal with effectively. Even the
Solaris solution strikes me as unwieldy.

Complicating my design is that I often need to immediately cancel a
thread, and cannot afford any blocking or waiting around for indeterminate
cleanups. These situations come up when I have a realtime thread needing to
abort a worker thread to be restarted later.

This is made difficult at the high level by the complexities at the low
level.

Robert.

David Gibbs wrote:

Robert Muil <> r.muil@crcmining.com.au> > wrote:
Thank you both. I have reconsidered my design based on this.

But, any design with cancelling, or aborting, threads is going to
have some nasty code/cleanup to handle. Why do you need to do
this?

Because of bad design on my part. I don’t really need to anymore,
but I am interested in it.

Surely an intuitive and useful solution in the OS would be to have
mutexes transfer ownership to the parent of a thread if it aborts,
or even just become uninitialised? There is no point at all in
having a locked initialised mutex owned by no thread.

As Igor has pointed out, there is no parent-child relationship for
threads. (Well, at the moment of creation there is, for inheritance
of attributes, beyond that point threads are peer-relationships, not
parent-child as with processes. For example any thread in a process
can pthread_join() any other thread to wait on its death.)

As to unitializing the mutex – much of the mutex code I’ve seen
doesn’t check the return on pthread_mutex_lock(), just assumes that
when it returns it is safe to enter the critical section. For an
un-inititialized mutex, this will result in an error return (not
checked) and then any number of threads entering that critical
section. (Yes, that is bad code… but it creates an in-obvious
problem for the bad code, the mutex staying locked on thread death at
least creates a fairly obvious and detectable bad situation.)

In real life threads sometimes die when you’re least expecting it.
For example if you got a SIGSEGV, it may be desirable to let the
just the offending thread die, instead of killing the entire
process. If the thread was one of ‘pool threads’ (i.e., transient)
handling a user request then killing the thread will affect only
one user request, rather than all of them. If the condition causing
the SIGSEGV does not happen often then such handling policy can
save you a lot of trouble by limiting the damage in the field while
you’re investigating.

I’ve dealt with designs where this might be needed. Also the need
to cancel some ongoing activity, where cancelling the thread was
the best way to go about it.

I didn’t reject totally cleaning up mutexes – I suggested a couple
ways to catch/cleanup (cancellation, the thread-local-storage
cleanup trick).

But, it is messy. In fact, the state of the mutex is sometimes far
less of a problem than verifying/checking the state of the data that
is being protected by the mutex. (e.g. half-completed insertion or
deletion from a linked list.)

FYI, Solaris has pthread_mutexattr_setrobust_np() and
pthread_mutex_consistent_np() to deal with this issue (the ‘np’
designating non-portable). Instead of thread death notifications
they have extra attribute and pthread_mutex_lock() has additonal
logic. I would say their approach is more generic and probably will
make way into POSIX some day.

One way to handle it. Might make it into POSIX.

-David

David_Gibbs1 · December 16, 2004, 6:18pm

Robert Muil <r.muil@crcmining.com.au> wrote:

David,

I agree that it is a difficult situation to deal with effectively. Even the
Solaris solution strikes me as unwieldy.

Complicating my design is that I often need to immediately cancel a
thread, and cannot afford any blocking or waiting around for indeterminate
cleanups. These situations come up when I have a realtime thread needing to
abort a worker thread to be restarted later.

Does the specification say that you must cancel the THREAD immediately,
or that you must cancel the OPERATION the thread is doing immediately?
How immediate is immediately? What does the thread you are cancelling
actually do? (For example, does it ever write() to a file or device?
Cancelling such an operation waits for an indeterminate cleanup.)

If the operation is halted, the thread confirms that the operation is
halted in some way (posting a semaphore?) and then cleans up mutexes,
data structures, etc – would this work?

e.g.
cancel_operation() {
pthread_cancel( pid );
sem_wait( &cancel_sem );
}

cancel_handler() // in cancelled thread
{
abort_operation() // in case something needs to be done – hardware state?
sem_post( &cancel_sem );
clean_up() // mutexes, data structures, etc
pthread_exit()
}

This is made difficult at the high level by the complexities at the low
level.

Yup.

-David

QNX Training Services
http://www.qnx.com/services/training/
Please followup in this newsgroup if you have further questions.

David_Gibbs1 · December 16, 2004, 6:43pm

David Gibbs <dagibbs@qnx.com> wrote:

e.g.
cancel_operation() {
pthread_cancel( pid );

doh!

pthread_cancel(tid);

-David

QNX Training Services
http://www.qnx.com/services/training/
Please followup in this newsgroup if you have further questions.

Robert_Muil · January 11, 2005, 9:01am

The specification is my own, based on the overall software requirement.
Immediate cancellation is required to deal with interrupts, so i would
prefer not even a context switch.

Your suggestion does look interesting though. I will investigate it when I
get back on that project.

Thanks David.

Robert.

Does the specification say that you must cancel the THREAD
immediately,
or that you must cancel the OPERATION the thread is doing immediately?
How immediate is immediately? What does the thread you are
cancelling actually do? (For example, does it ever write() to a file
or device? Cancelling such an operation waits for an indeterminate
cleanup.)

If the operation is halted, the thread confirms that the operation is
halted in some way (posting a semaphore?) and then cleans up mutexes,
data structures, etc – would this work?

e.g.
cancel_operation() {
pthread_cancel( pid );
sem_wait( &cancel_sem );
}

cancel_handler() // in cancelled thread
{
abort_operation() // in case something needs to be done – hardware
state? sem_post( &cancel_sem );
clean_up() // mutexes, data structures, etc
pthread_exit()
}

This is made difficult at the high level by the complexities at the
low level.

Yup.

-David