Thread Programming

How a thread can supends and resumes another thread create
with ThreadCreate() fonction ?

bruno.suarez@scola.ac-paris.fr a écrit :

How a thread can supends and resumes another thread create
with ThreadCreate() fonction ?

You can use ThreadCancel() or ThreadDestroy()
!!! BUT!!! I think it’s really much better to use POSIX equivalent
pthread_create()/pthread_cancel()

Alain.

Danger Will Robinson.

pthread_create creates a thread.
pthread_cancel kills your thread.

He was asking for suspend/resume.

There aren’t any posix thread functions to do this (to my knowledge).
I think the best way to to use a mutex at the top of the thread loop
and check that (don’t worry, an uncontested mutex check is cheap), and
then block that when you want to suspend.

Alain Bonnefoy <alain.bonnefoy@icbt.com> wrote:


bruno.suarez@scola.ac-paris.fr > a écrit :

How a thread can supends and resumes another thread create
with ThreadCreate() fonction ?

You can use ThreadCancel() or ThreadDestroy()
!!! BUT!!! I think it’s really much better to use POSIX equivalent
pthread_create()/pthread_cancel()

Alain.


cburgess@qnx.com

How about sending SIGSTOP/SIGCONT to a particular thread?

Colin Burgess wrote:

Danger Will Robinson.

pthread_create creates a thread.
pthread_cancel kills your thread.

He was asking for suspend/resume.

There aren’t any posix thread functions to do this (to my knowledge).
I think the best way to to use a mutex at the top of the thread loop
and check that (don’t worry, an uncontested mutex check is cheap), and
then block that when you want to suspend.

Alain Bonnefoy <> alain.bonnefoy@icbt.com> > wrote:

bruno.suarez@scola.ac-paris.fr > a écrit :

How a thread can supends and resumes another thread create
with ThreadCreate() fonction ?

You can use ThreadCancel() or ThreadDestroy()
!!! BUT!!! I think it’s really much better to use POSIX equivalent
pthread_create()/pthread_cancel()

Alain.


cburgess@qnx.com

That will stop your entire process.

Igor Kovalenko <Igor.Kovalenko@motorola.com> wrote:

How about sending SIGSTOP/SIGCONT to a particular thread?

Colin Burgess wrote:

Danger Will Robinson.

pthread_create creates a thread.
pthread_cancel kills your thread.

He was asking for suspend/resume.

There aren’t any posix thread functions to do this (to my knowledge).
I think the best way to to use a mutex at the top of the thread loop
and check that (don’t worry, an uncontested mutex check is cheap), and
then block that when you want to suspend.

Alain Bonnefoy <> alain.bonnefoy@icbt.com> > wrote:

bruno.suarez@scola.ac-paris.fr > a écrit :

How a thread can supends and resumes another thread create
with ThreadCreate() fonction ?

You can use ThreadCancel() or ThreadDestroy()
!!! BUT!!! I think it’s really much better to use POSIX equivalent
pthread_create()/pthread_cancel()

Alain.


cburgess@qnx.com


cburgess@qnx.com

bruno.suarez@scola.ac-paris.fr wrote:
: How a thread can supends and resumes another thread create
: with ThreadCreate() fonction ?

There is not suspends and resumes in POSIX threads, but
you can implement them on top of condvar/mutex, David B.
Pthread book comes with an implementation. There is also
a rationale of why this was ommited in the POSIX document, I think.
I read an article once about the danger of resume/cancel and why
you should use condvar instead but could not find the URL, sorry.


au revoir, alain

Aussi haut que l’on soit assis, on n’est toujours assis que sur son cul !!!

In article <8t75ig$c5l$1@nntp.qnx.com>, Alain Magloire <alain@qnx.com> wrote:

bruno.suarez@scola.ac-paris.fr > wrote:
: How a thread can supends and resumes another thread create
: with ThreadCreate() fonction ?

There is not suspends and resumes in POSIX threads, but
you can implement them on top of condvar/mutex, David B.
Pthread book comes with an implementation. There is also
a rationale of why this was ommited in the POSIX document, I think.
I read an article once about the danger of resume/cancel and why
you should use condvar instead but could not find the URL, sorry.

To summarize – although I don’t think I’ve seen that particular
example – here’s what you do: reserve a signal (e.g. SIGUSR1, SIGRTMIN)
for suspending threads, keep thread state in a – globally accessible –
thread-specific data structure and add a mutex and condition variable for
protecting that structure.

To suspend the thread:

  1. Drop the signal on it
  2. The signal handler for the suspend signal should do:

thread_state_t *state = (thread_state_t *)
pthread_getspecific(thread_state_key);
pthread_mutex_lock(&state->mutex);
while (state->suspended) {
pthread_cond_wait(&state->cv, &state->mutex);
}
pthread_mutex_unlock(&state->mutex);

To resume a thread:

thread_state *state = find_thread_state(tid);
pthread_mutex_lock(&state->mutex);
state->suspended = 0;
pthread_cond_signal(&state->cv);
pthread_mutex_unlock(&state->mutex);


Now for the trouble:

First order effect: deadlock
Reason: It is unsafe to drop a signal on a thread in this manner, since
it may be in COND_WAIT state. If it is, it will have to re-acquire the
mutex prior to running the signal handler. This is most noticeable
if you try to suspend a number of threads at the same time.

You can get around this by wrapping pthread_cond_wait and
pthread_cond_timedwait (beyond the scope of this post, but you
dlopen libc to get the addresses of the real functions and override
them in your own library). Before you wait, you mask out the suspend
signal. After returning, you restore the signal mask.

Second order effect: lots of interrupted system calls (or I/O requests)
Reason: I/O requests in particular can’t behave as though they
were restartable system calls, so when the suspend signal is dropped
on a thread that is in SEND or REPLY state waiting for a resource
manager, it may return EINTR.

This may be okay if your application is tolerant of EINTR replies,
but in my experience far too many libraries don’t cope well with
this – or they assume you can set SA_RESTART on all the signal
handlers – so it only works if you wrote or are fully aware
of all the code in the application.

As with the first case, you can avoid this by wrapping all I/O
functions and selected system call functions, and masking
the signal prior to making the call.


As you can see it gets complicated really quickly. Sun’s
mark-and-sweep collector for Java requires this, and I can
tell you I hate it. There are more complications involved there.

I would strongly recommend that any design involving the suspension
or resumption of threads be reconsidered. At first glance what
may not seem amenable to a condition variable can be restructured
slightly to allow for one. In general, anything that doesn’t
require asynchronous suspension of the thread, can be re-worked
into a condition on the thread.


Steve Furr email: furr@qnx.com
QNX Software Systems, Ltd.

Steve Furr wrote:

[snip]

Now for the trouble:

First order effect: deadlock
Reason: It is unsafe to drop a signal on a thread in this manner, since
it may be in COND_WAIT state. If it is, it will have to re-acquire the
mutex prior to running the signal handler. This is most noticeable
if you try to suspend a number of threads at the same time.

You can get around this by wrapping pthread_cond_wait and
pthread_cond_timedwait (beyond the scope of this post, but you
dlopen libc to get the addresses of the real functions and override
them in your own library). Before you wait, you mask out the suspend
signal. After returning, you restore the signal mask.

Since he’s implementing suspend/resume, why can’t he just mask/unmask
signals explicitly? I mean, the trick with overloading libc is only
required if you don’t have control over source code, but you do, if
you’re writing that state/condvar stuff.

Second order effect: lots of interrupted system calls (or I/O requests)
Reason: I/O requests in particular can’t behave as though they
were restartable system calls, so when the suspend signal is dropped
on a thread that is in SEND or REPLY state waiting for a resource
manager, it may return EINTR.

This may be okay if your application is tolerant of EINTR replies,
but in my experience far too many libraries don’t cope well with
this – or they assume you can set SA_RESTART on all the signal
handlers – so it only works if you wrote or are fully aware
of all the code in the application.

One might be wondering which libraries you mean. If those are libraries
coming from QSSL, that’s who should fix the problem. If not, well then
it is responsibility of developer to use proper libraries.

As with the first case, you can avoid this by wrapping all I/O
functions and selected system call functions, and masking
the signal prior to making the call.

As you can see it gets complicated really quickly. Sun’s
mark-and-sweep collector for Java requires this, and I can
tell you I hate it. There are more complications involved there.

I would strongly recommend that any design involving the suspension
or resumption of threads be reconsidered. At first glance what
may not seem amenable to a condition variable can be restructured
slightly to allow for one. In general, anything that doesn’t
require asynchronous suspension of the thread, can be re-worked
into a condition on the thread.

And what if asyncronous thread control is required? And it is often
required (or desireable anyway) for reliable user interface
implementations of mission-critical systems. The problem with syncronous
approach is, the thread which supposed to wait for events might be
dead/screwed and not in the position to accept your input. So, if you
have a system failure and want to initiate an emergency operation it
might, or might not work, which is unacceptable for such systems.

Much of the trouble would be removed if QNX implemented SA_RESTART. May
be you guys could concentrate on that, instead of hate to asyncronous
operations. Hate is unproductive anyway.

  • igor

Hi Igor,

Igor Kovalenko <Igor.Kovalenko@motorola.com> wrote in message
news:39F86E9F.FB26F1B5@motorola.com

Much of the trouble would be removed if QNX implemented SA_RESTART. May
be you guys could concentrate on that, instead of hate to asyncronous
operations. Hate is unproductive anyway.

Hey, you are the guy using the word hate! :slight_smile:
All I heard them say was that there might be problems doing it this way!

:sunglasses:

  • igor

“Steve Munnings, Corman Technologies” wrote:

Hi Igor,

Igor Kovalenko <> Igor.Kovalenko@motorola.com> > wrote in message
news:> 39F86E9F.FB26F1B5@motorola.com> …
snip

Much of the trouble would be removed if QNX implemented SA_RESTART. May
be you guys could concentrate on that, instead of hate to asyncronous
operations. Hate is unproductive anyway.

Hey, you are the guy using the word hate! > :slight_smile:
All I heard them say was that there might be problems doing it this way!

You must have been reading only even lines or only odd :wink:
Steve used the word…

  • igor

Igor Kovalenko <Igor.Kovalenko@motorola.com> wrote in message
news:39F88803.E4237C4@motorola.com

“Steve Munnings, Corman Technologies” wrote:

Hi Igor,

Igor Kovalenko <> Igor.Kovalenko@motorola.com> > wrote in message
news:> 39F86E9F.FB26F1B5@motorola.com> …
snip

Much of the trouble would be removed if QNX implemented SA_RESTART.
May
be you guys could concentrate on that, instead of hate to asyncronous
operations. Hate is unproductive anyway.

Hey, you are the guy using the word hate! > :slight_smile:
All I heard them say was that there might be problems doing it this way!


You must have been reading only even lines or only odd > :wink:
Steve used the word…

I had to use the “find” feature to find it. Ahh - see it…
I thought he was “hating” “Sun’s mark-and-sweep collector for Java”

O.K. I stand corrected! :slight_smile:

  • igor

In article <39F86E9F.FB26F1B5@motorola.com>,
Igor Kovalenko <Igor.Kovalenko@motorola.com> wrote:

Steve Furr wrote:

[snip]

Now for the trouble:

First order effect: deadlock
Reason: It is unsafe to drop a signal on a thread in this manner, since
it may be in COND_WAIT state. If it is, it will have to re-acquire the
mutex prior to running the signal handler. This is most noticeable
if you try to suspend a number of threads at the same time.

You can get around this by wrapping pthread_cond_wait and
pthread_cond_timedwait (beyond the scope of this post, but you
dlopen libc to get the addresses of the real functions and override
them in your own library). Before you wait, you mask out the suspend
signal. After returning, you restore the signal mask.


Since he’s implementing suspend/resume, why can’t he just mask/unmask
signals explicitly? I mean, the trick with overloading libc is only
required if you don’t have control over source code, but you do, if
you’re writing that state/condvar stuff.

Of course he can, but if there are hundreds of calls to I/O routines
or messaging, would you rather change it in one place, or many times?
Of course, if the source code for the entire application is available,
you could also use a macro.


Second order effect: lots of interrupted system calls (or I/O requests)
Reason: I/O requests in particular can’t behave as though they
were restartable system calls, so when the suspend signal is dropped
on a thread that is in SEND or REPLY state waiting for a resource
manager, it may return EINTR.

This may be okay if your application is tolerant of EINTR replies,
but in my experience far too many libraries don’t cope well with
this – or they assume you can set SA_RESTART on all the signal
handlers – so it only works if you wrote or are fully aware
of all the code in the application.


One might be wondering which libraries you mean. If those are libraries
coming from QSSL, that’s who should fix the problem. If not, well then
it is responsibility of developer to use proper libraries.

All well and good, Igor, but I was taking into account that a great
many applications simply don’t come from just QNX libraries and
the application libraries. There is often a lot of third party code.

As with the first case, you can avoid this by wrapping all I/O
functions and selected system call functions, and masking
the signal prior to making the call.

As you can see it gets complicated really quickly. Sun’s
mark-and-sweep collector for Java requires this, and I can
tell you I hate it. There are more complications involved there.

I would strongly recommend that any design involving the suspension
or resumption of threads be reconsidered. At first glance what
may not seem amenable to a condition variable can be restructured
slightly to allow for one. In general, anything that doesn’t
require asynchronous suspension of the thread, can be re-worked
into a condition on the thread.

And what if asyncronous thread control is required? And it is often
required (or desireable anyway) for reliable user interface
implementations of mission-critical systems. The problem with syncronous
approach is, the thread which supposed to wait for events might be
dead/screwed and not in the position to accept your input. So, if you
have a system failure and want to initiate an emergency operation it
might, or might not work, which is unacceptable for such systems.

Much of the trouble would be removed if QNX implemented SA_RESTART. May
be you guys could concentrate on that, instead of hate to asyncronous
operations. Hate is unproductive anyway.

Excuse, me but (a) only a minor part of the trouble is removed by SA_RESTART,
and (b) as was pointed out, I only hate arbitrary asynchronous operations
as a means of doing garbage collection. Apparently Sun recognized that
it was a problem, because none of the garbage collectors in any of
their upcoming products use this strategy any more.

Implementing SA_RESTART is another issue entirely. It involves the
messaging in the kernel, Proc, infrastructure within the resource manager
framework, and coding in each of the resource managers. If I had my
druthers, they would have worked it in to the original design, considering
that it was previously recognized as a weakness in QNX4.

Asynchronous thread control is a nasty area – independent of restartable
system calls – and it is prone to deadlock. As a matter of design
robustness I would usually recommend introducing safe blocking points
where resource contention is known not to be a problem over attempting it.
The penalty is then in the latency. Otherwise, thread suspension really
belongs in the kernel which can deal with issues of deadlock – by releasing
all mutexes, for example, and re-acquiring them upon resumption. Hopefully,
this will happen sooner rather than later. It has been discussed.


Steve Furr email: furr@qnx.com
QNX Software Systems, Ltd.