fastest communication method???

Nnamdi_Kohn1 · October 8, 2004, 2:21pm

Hello,

I’m trying to develop an application that (as fast as possible) switches
between different threads. Like a scheduler that controls the invocation of
real-time threads. So I’m looking for the FASTEST communication method QNX
is able to provide to perform this thread-switching. There are, as far as I
know, the following possibilities:

normal, blocking message passing (send, receive and reply)
non-blocking pulses
condition variables with mutexes
signals

Method 1 copies some data area from the sender (scheduler) to the receiver
(real-time thread). This does not seem to be the fastest method to me.
Method 2 is non blocking; so I would have to artifically prevent the
scheduler from processing. Method 3 uses synchronisation points that consume
time depending on the number of threads currently “listening”
(COND_BROADCAST) or it requires one condition variable per real-time thread
(COND_SIGNAL). Even with cond_signal, a complete switch to and from the
thread consumes 17 microseconds on my 3GHz machine. That’s too much for my
application. Method 4 might be an other possibility, but it seems to be
non-blocking as well. Is there something like a “best” method that provides
fastest switching of threads?

Regards.

Nnamdi

David_Gibbs1 · October 8, 2004, 5:23pm

Nnamdi Kohn <nnamdi.kohn@tu-bs.de> wrote:

Hello,

I’m trying to develop an application that (as fast as possible) switches
between different threads. Like a scheduler that controls the invocation of
real-time threads. So I’m looking for the FASTEST communication method QNX
is able to provide to perform this thread-switching. There are, as far as I
know, the following possibilities:

normal, blocking message passing (send, receive and reply)

non-blocking pulses

condition variables with mutexes

signals

semaphores

Method 1 copies some data area from the sender (scheduler) to the receiver
(real-time thread). This does not seem to be the fastest method to me.
Method 2 is non blocking; so I would have to artifically prevent the
scheduler from processing. Method 3 uses synchronisation points that consume
time depending on the number of threads currently “listening”
(COND_BROADCAST) or it requires one condition variable per real-time thread
(COND_SIGNAL). Even with cond_signal, a complete switch to and from the
thread consumes 17 microseconds on my 3GHz machine. That’s too much for my
application. Method 4 might be an other possibility, but it seems to be
non-blocking as well. Is there something like a “best” method that provides
fastest switching of threads?

From what I read here, you want to:

dispatch the worker thread
block the dispatching thread until the worker thread has completed

The problem with condvars, semaphores, signals or pulses, is that all
are non-blocking on the dispatch side.

I think your two best choices would be:

sem_post/sem_wait, where you have one semaphore per real-time thread, and
the scheduler will sem_post() the appropriate semaphore, and then sem_wait()
on it. The thread will be sem_wait() on the semaphore until posted, then
will unblock, work, and sem_post() followed by sem_wait() when done. Now,
non-contested semaphores won’t result in a kernel call – but these are
all contested (each is a change of thread state or active thread) so you’ll
have 4 kernel calls per exchange. [Plus context switch times.]

For the send/receive/reply situation, I would suggest that you use 0 length
data buffers, and make sure the worker thread passes NULL for the info
parameter on the MsgReceive(), giving no data copy and most efficient
path for the S/R/R, where you’re effectively just using it for scheduling.
This will give 3 kernel calls per exchange [MsgSend(), MsgReceive(), and
MsgReply()], which would be one less then the semaphore case.

But, it is probably worth benchmarking one vs the other.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

John_Nagle1 · October 10, 2004, 5:36am

It would be good to see the numbers on this.

One of the benefits of MsgSend/MsgReceive is supposed to
be that in the normal case, where the receiving thread is waiting
for a MsgReceive, control transfers immmediately, without going
through the find-next-ready-task part of the dispatcher.

If you benchmark this, it would be interesting to try it
with multiple programs all doing this. My guess is that
as the number of active processes increases, MsgSend will
win out over mutexes.

John Nagle

David Gibbs wrote:

Nnamdi Kohn <> nnamdi.kohn@tu-bs.de> > wrote:

Hello,

I’m trying to develop an application that (as fast as possible) switches
between different threads. Like a scheduler that controls the invocation of
real-time threads. So I’m looking for the FASTEST communication method QNX
is able to provide to perform this thread-switching. There are, as far as I
know, the following possibilities:

\

normal, blocking message passing (send, receive and reply)

non-blocking pulses

condition variables with mutexes

signals

\

semaphores

Method 1 copies some data area from the sender (scheduler) to the receiver
(real-time thread). This does not seem to be the fastest method to me.
Method 2 is non blocking; so I would have to artifically prevent the
scheduler from processing. Method 3 uses synchronisation points that consume
time depending on the number of threads currently “listening”
(COND_BROADCAST) or it requires one condition variable per real-time thread
(COND_SIGNAL). Even with cond_signal, a complete switch to and from the
thread consumes 17 microseconds on my 3GHz machine. That’s too much for my
application. Method 4 might be an other possibility, but it seems to be
non-blocking as well. Is there something like a “best” method that provides
fastest switching of threads?

From what I read here, you want to:

dispatch the worker thread

block the dispatching thread until the worker thread has completed

The problem with condvars, semaphores, signals or pulses, is that all
are non-blocking on the dispatch side.

I think your two best choices would be:

sem_post/sem_wait, where you have one semaphore per real-time thread, and
the scheduler will sem_post() the appropriate semaphore, and then sem_wait()
on it. The thread will be sem_wait() on the semaphore until posted, then
will unblock, work, and sem_post() followed by sem_wait() when done. Now,
non-contested semaphores won’t result in a kernel call – but these are
all contested (each is a change of thread state or active thread) so you’ll
have 4 kernel calls per exchange. [Plus context switch times.]

For the send/receive/reply situation, I would suggest that you use 0 length
data buffers, and make sure the worker thread passes NULL for the info
parameter on the MsgReceive(), giving no data copy and most efficient
path for the S/R/R, where you’re effectively just using it for scheduling.
This will give 3 kernel calls per exchange [MsgSend(), MsgReceive(), and
MsgReply()], which would be one less then the semaphore case.

But, it is probably worth benchmarking one vs the other.

-David

Armin_Steinhoff1 · October 11, 2004, 12:16pm

Nnamdi Kohn wrote:

Hello,

I’m trying to develop an application that (as fast as possible) switches
between different threads.

A port of ‘portos’ to QNX6 could be a solution → http://www.portos.org

Regards

Armin Steinhoff

Like a scheduler that controls the invocation of
real-time threads. So I’m looking for the FASTEST communication method QNX
is able to provide to perform this thread-switching. There are, as far as I
know, the following possibilities:

normal, blocking message passing (send, receive and reply)

non-blocking pulses

condition variables with mutexes

signals

Method 1 copies some data area from the sender (scheduler) to the receiver
(real-time thread). This does not seem to be the fastest method to me.
Method 2 is non blocking; so I would have to artifically prevent the
scheduler from processing. Method 3 uses synchronisation points that consume
time depending on the number of threads currently “listening”
(COND_BROADCAST) or it requires one condition variable per real-time thread
(COND_SIGNAL). Even with cond_signal, a complete switch to and from the
thread consumes 17 microseconds on my 3GHz machine. That’s too much for my
application. Method 4 might be an other possibility, but it seems to be
non-blocking as well. Is there something like a “best” method that provides
fastest switching of threads?

Regards.

Nnamdi