Help with deadlock problem

I think I have a classic priority inversion problem in a test app,
but can’t seem to puzzle through it. It runs fine for awhile, then four
of its threads hang, apparently waiting for a message from devc-pty(?)
They never wake up again (pidin output appended below).

I don’t understand how devc-pty can show up as RECEIVE-blocked (with
priority 10) when those four threads are REPLY-blocked on it, but that’s
probably because I don’t quite ‘get’ QNX IPC.

Can anyone shed light on what’s happening here?

TIA,

  • Dave


    BEFORE LOCK-UP:
    pid tid name prio STATE Blocked
    45067 1 sbin/devc-pty 20o RECEIVE 1
    .
    .
    23605283 1 Test/Unix/sntest 63f RECEIVE 1
    23605283 2 Test/Unix/sntest 62f CONDVAR 8092cc0
    23605283 3 Test/Unix/sntest 11f NANOSLEEP
    23605283 4 Test/Unix/sntest 11f CONDVAR 8092bf0
    23605283 5 Test/Unix/sntest 11f NANOSLEEP
    23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
    23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
    23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8

AFTER LOCK-UP:
45067 1 sbin/devc-pty 10o RECEIVE 1
.
.
23605283 1 Test/Unix/sntest 63f RECEIVE 1
23605283 2 Test/Unix/sntest 62f REPLY 45067
23605283 3 Test/Unix/sntest 11f REPLY 45067
23605283 4 Test/Unix/sntest 11f REPLY 45067
23605283 5 Test/Unix/sntest 11f REPLY 45067
23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8

David Wolfe <da5id@luvspamwolfe.name> wrote:

I think I have a classic priority inversion problem in a test app,
but can’t seem to puzzle through it. It runs fine for awhile, then four
of its threads hang, apparently waiting for a message from devc-pty(?)
They never wake up again (pidin output appended below).

I don’t understand how devc-pty can show up as RECEIVE-blocked (with
priority 10) when those four threads are REPLY-blocked on it, but that’s
probably because I don’t quite ‘get’ QNX IPC.

Can anyone shed light on what’s happening here?

That’s a reasonably normal state for somebody who is waiting for input,
or waiting for output to complete.

For example, if you try to read a console (or virtual console, such as
a pseudo-tty) for typed input, and nobody types anything, then you will
be REPLY block on that server waiting for your read() to be fulfilled.

The server, of course, may be dealing with other clients, waiting for
hardware, or whatever – so it goes back to MsgReceive() and ends up
RECEIVE blocked – if it gets something for you, it will then MsgReply()
you, using the stored away rcvid.

This is normal behaviour.

So, this means that likely either:

– the programs are waiting for user input through a pseudo-tty
– the programs are waiting for output to complete, but it hasn’t,
possibly due to someone using ctr-s or something similar to flow-control
the output, or whoever is on the other end of the pseudo-tty not emptying
the buffers and the buffers have filled up.

Hope that helps clarify a bit.

-David


TIA,

  • Dave



    BEFORE LOCK-UP:
    pid tid name prio STATE Blocked
    45067 1 sbin/devc-pty 20o RECEIVE 1
    .
    .
    23605283 1 Test/Unix/sntest 63f RECEIVE 1
    23605283 2 Test/Unix/sntest 62f CONDVAR 8092cc0
    23605283 3 Test/Unix/sntest 11f NANOSLEEP
    23605283 4 Test/Unix/sntest 11f CONDVAR 8092bf0
    23605283 5 Test/Unix/sntest 11f NANOSLEEP
    23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
    23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
    23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8

AFTER LOCK-UP:
45067 1 sbin/devc-pty 10o RECEIVE 1
.
.
23605283 1 Test/Unix/sntest 63f RECEIVE 1
23605283 2 Test/Unix/sntest 62f REPLY 45067
23605283 3 Test/Unix/sntest 11f REPLY 45067
23605283 4 Test/Unix/sntest 11f REPLY 45067
23605283 5 Test/Unix/sntest 11f REPLY 45067
23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8






QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

I don’t understand how devc-pty can show up as RECEIVE-blocked (with
priority 10) when those four threads are REPLY-blocked on it…

This is normal behaviour…

… this means that likely either:

– the programs are waiting for user input through a pseudo-tty
– the programs are waiting for output to complete, but it hasn’t
possibly due to someone using ctr-s or something similar to
flow-control the output, or whoever is on the other end of the
pseudo-tty not emptying the buffers and the buffers have filled up.

Hope that helps clarify a bit.

Well, yes and no. :-S I guess what’s confusing me is that I thought
devc-pty would automatically get ‘promoted’ to the priority of the
highest-priority thread who is blocked waiting for a reply from him.
(I’m not sure where I got this impression–somewhere in the System
Architecture guide, I believe. Perhaps I’m mistaken?) So this:

AFTER LOCK-UP:
45067 1 sbin/devc-pty 10o RECEIVE 1
.
.
23605283 2 Test/Unix/sntest 62f REPLY 45067
23605283 3 Test/Unix/sntest 11f REPLY 45067
23605283 4 Test/Unix/sntest 11f REPLY 45067
23605283 5 Test/Unix/sntest 11f REPLY 45067

seems weird to me because devc-pty’s priority is 10 and not 62, even
though a priority 62 thread is REPLY-blocked on him. (Isn’t that what
the ‘45067’ in the last column means?) Why doesn’t devc-pty answer,
like, immediately? (This is where it starts to feel like some
priority inversion exam question…)

Still a bit befuddled,

  • Dave

David Wolfe <da5id@luvspamwolfe.name> wrote:

I don’t understand how devc-pty can show up as RECEIVE-blocked (with
priority 10) when those four threads are REPLY-blocked on it…

This is normal behaviour…

… this means that likely either:

– the programs are waiting for user input through a pseudo-tty
– the programs are waiting for output to complete, but it hasn’t
possibly due to someone using ctr-s or something similar to
flow-control the output, or whoever is on the other end of the
pseudo-tty not emptying the buffers and the buffers have filled up.

Hope that helps clarify a bit.

DW > Well, yes and no. :-S I guess what’s confusing me is that I thought
DW > devc-pty would automatically get ‘promoted’ to the priority of the
DW > highest-priority thread who is blocked waiting for a reply from him.
DW > (I’m not sure where I got this impression–somewhere in the System
DW > Architecture guide, I believe. Perhaps I’m mistaken?) So this:

Hi Dave,

This is confusing at first but here is an easy explanation (I think).

If a thread is BLOCKED (I.E. Not Running) then it doesn’t matter what
priority is isn’trunning at! As soon as somehing causes it to come
out of the BLOCKED state (I.E. running again) then whatever unblocked
it determines what priority it starts up again at (often, but not
always).

Also, you have to realize that devc-pty probibly has many clients,
not just your thread. So, whatever the last thread to wake it up was
determines what priority it ws at when it last went to sleep.


Bill Caroselli – Q-TPS Consulting
1-(626) 824-7983
qtps@earthlink.net

DW>I don’t understand how devc-pty can show up as RECEIVE-blocked
DW>(with priority 10) when those four threads are REPLY-blocked on
DW> it…
DW> … I guess what’s confusing me is that I thought
DW> devc-pty would automatically get ‘promoted’ to the priority of the
DW> highest-priority thread who is blocked waiting for a reply from him.
DW> (I’m not sure where I got this impression–somewhere in the System
DW> Architecture guide, I believe. Perhaps I’m mistaken?)


This is confusing at first but here is an easy explanation (I think).

If a thread is BLOCKED (I.E. Not Running) then it doesn’t matter what
priority is isn’trunning at! As soon as somehing causes it to come
out of the BLOCKED state (I.E. running again) then whatever unblocked
it determines what priority it starts up again at (often, but not
always).

Thanks to you and David for your kind replies. :slight_smile: I think I’m
starting to see what may be going on here. But I’m still not quite
clear on what the solution ought to be. Given the following threads:

45067 1 sbin/devc-pty 10o RECEIVE 1
.
.
23605283 1 Test/Unix/sntest 63f RECEIVE 1
23605283 2 Test/Unix/sntest 62f REPLY 45067
23605283 3 Test/Unix/sntest 11f REPLY 45067
23605283 4 Test/Unix/sntest 11f REPLY 45067
23605283 5 Test/Unix/sntest 11f REPLY 45067
23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8

how can I figure out who is waiting for what? It sounds like you guys
are saying that devc-pty is having a back-and-forth conversation with
some priority 10 thread, and he can’t wake up until that priority 10
thread has a chance to answer? And my priority 11 thread… hrm–darn
it, I just confused myself again. None of my threads are READY. So how
can they be holding up the show? Isn’t the only way to have a total
deadlock for some ‘greedy’ 10+ priority FIFO thread to to be
completely hogging the CPU?

What could be preventing the messages that my threads have (apparently?)
sent to devc-pty from being processed indefinitely? It can’t be one of
my threads, can it, since they are all blocked? How can I find out
who the ‘guilty party’ in this scenario is?

Post the whole pidin output. There might be more to this that meet the eye.
And code for sntest would not hurt, if it is small enough.

– igor

“David Wolfe” <da5id@LUVSPAMwolfe.name> wrote in message
news:b8srvk$oum$1@inn.qnx.com

DW>I don’t understand how devc-pty can show up as RECEIVE-blocked
DW>(with priority 10) when those four threads are REPLY-blocked on
DW> it…
DW> … I guess what’s confusing me is that I thought
DW> devc-pty would automatically get ‘promoted’ to the priority of the
DW> highest-priority thread who is blocked waiting for a reply from him.
DW> (I’m not sure where I got this impression–somewhere in the System
DW> Architecture guide, I believe. Perhaps I’m mistaken?)


This is confusing at first but here is an easy explanation (I think).

If a thread is BLOCKED (I.E. Not Running) then it doesn’t matter what
priority is isn’trunning at! As soon as somehing causes it to come
out of the BLOCKED state (I.E. running again) then whatever unblocked
it determines what priority it starts up again at (often, but not
always).

Thanks to you and David for your kind replies. > :slight_smile: > I think I’m
starting to see what may be going on here. But I’m still not quite
clear on what the solution ought to be. Given the following threads:

45067 1 sbin/devc-pty 10o RECEIVE 1
.
.
23605283 1 Test/Unix/sntest 63f RECEIVE 1
23605283 2 Test/Unix/sntest 62f REPLY 45067
23605283 3 Test/Unix/sntest 11f REPLY 45067
23605283 4 Test/Unix/sntest 11f REPLY 45067
23605283 5 Test/Unix/sntest 11f REPLY 45067
23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8

how can I figure out who is waiting for what? It sounds like you guys
are saying that devc-pty is having a back-and-forth conversation with
some priority 10 thread, and he can’t wake up until that priority 10
thread has a chance to answer? And my priority 11 thread… hrm–darn
it, I just confused myself again. None of my threads are READY. So how
can they be holding up the show? Isn’t the only way to have a total
deadlock for some ‘greedy’ 10+ priority FIFO thread to to be
completely hogging the CPU?

What could be preventing the messages that my threads have (apparently?)
sent to devc-pty from being processed indefinitely? It can’t be one of
my threads, can it, since they are all blocked? How can I find out
who the ‘guilty party’ in this scenario is?

“Igor Kovalenko” <kovalenko@attbi.com> wrote:

Post the whole pidin output. There might be more to this that meet the
eye. And code for sntest would not hurt, if it is small enough.

Output from pidin appended below. The sntest code is too long to post,
but I’ll try to describe what it does. The priority 63 thread is a
‘dispatch thread’ that sits in a loop waiting for a timer pulse to be
delivered:

while( exec->Is_Running() ) {

rcvid = MsgReceive( chid, &msg, sizeof( msg ), NULL );

if( rcvid != 0 ) /* NOT a pulse from the timer–ignore */
continue;

/* Call the Exec dispatcher: */
exec->Dispatch();
}

The Dispatch() function kicks each thread when it is supposed to run
by signalling a condvar. One other thing it does is call recvfrom() on
a non-blocking UDP socket to see if any packets from other machines have
arrived. (Note: for the tests I’ve been doing, there is no ethernet
traffic, so recvfrom() always returns -1/EWOULDBLOCK.)


pid tid name prio STATE Blocked
1 1 6/boot/sys/procnto 0f READY
1 2 6/boot/sys/procnto 10r RECEIVE 1
1 3 6/boot/sys/procnto 10r RECEIVE 1
1 4 6/boot/sys/procnto 10r RUNNING
1 5 6/boot/sys/procnto 63r RECEIVE 1
1 6 6/boot/sys/procnto 10r RECEIVE 1
1 7 6/boot/sys/procnto 10r RECEIVE 1
1 8 6/boot/sys/procnto 10r RECEIVE 1
1 9 6/boot/sys/procnto 6r NANOSLEEP
1 10 6/boot/sys/procnto 12r RECEIVE 1
1 11 6/boot/sys/procnto 10r RECEIVE 1
1 12 6/boot/sys/procnto 63r RECEIVE 1
1 13 6/boot/sys/procnto 10r RECEIVE 1
1 14 6/boot/sys/procnto 10r RECEIVE 1
2 1 sbin/tinit 10o REPLY 1
3 1 proc/boot/slogger 10o RECEIVE 1
12292 1 sbin/mqueue 10o RECEIVE 1
5 1 proc/boot/pci-bios 12o RECEIVE 1
6 1 roc/boot/devb-eide 10o SIGWAITINFO
6 2 roc/boot/devb-eide 21r RECEIVE 1
6 3 roc/boot/devb-eide 21r RECEIVE 4
6 4 roc/boot/devb-eide 10o RECEIVE 10
6 5 roc/boot/devb-eide 10r CONDVAR b822ac20
6 6 roc/boot/devb-eide 63o RECEIVE 7
6 7 roc/boot/devb-eide 63o RECEIVE 7
6 8 roc/boot/devb-eide 10o RECEIVE 7
7 1 /x86/sbin/devc-con 15o RECEIVE 1
8 1 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 2 .3/x86/sbin/fs-pkg 10o SIGWAITINFO
8 3 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 4 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 5 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 6 .3/x86/sbin/fs-pkg 10o RECEIVE 1
4105 1 sbin/pipe 10o RECEIVE 1
4105 2 sbin/pipe 10o RECEIVE 1
4105 3 sbin/pipe 10o RECEIVE 1
880650 1 r/photon/bin/pterm 10o RECEIVE 1
45067 1 sbin/devc-pty 10o RECEIVE 1
77836 1 sbin/devc-par 10o RECEIVE 1
77836 2 sbin/devc-par 9r CONDVAR 804fa18
667661 1 /photon/bin/Photon 12r RECEIVE 1
77838 1 usr/sbin/spooler 10o NANOSLEEP
77839 1 sbin/io-net 10o SIGWAITINFO
77839 2 sbin/io-net 10o RECEIVE 1
77839 3 sbin/io-net 10o RECEIVE 1
77839 4 sbin/io-net 10o RECEIVE 1
77839 5 sbin/io-net 10o RECEIVE 1
77839 6 sbin/io-net 63o RECEIVE 6
77839 7 sbin/io-net 21r RECEIVE 22
126992 1 sbin/devb-fdc 10o SIGWAITINFO
126992 2 sbin/devb-fdc 21r RECEIVE 1
126992 3 sbin/devb-fdc 10o RECEIVE 7
126992 4 sbin/devb-fdc 10o CONDVAR b822ac20
126992 5 sbin/devb-fdc 10o RECEIVE 4
126992 6 sbin/devb-fdc 10o RECEIVE 4
126992 7 sbin/devb-fdc 10o RECEIVE 4
94225 1 sbin/devc-ser8250 24o RECEIVE 1
135186 1 usr/sbin/random 10o SIGWAITINFO
135186 2 usr/sbin/random 10o RECEIVE 1
135186 3 usr/sbin/random 10o NANOSLEEP
147475 1 usr/sbin/dumper 10o RECEIVE 1
94228 1 sbin/io-audio 10o SIGWAITINFO
94228 2 sbin/io-audio 10o RECEIVE 1
94228 3 sbin/io-audio 10o RECEIVE 1
94228 4 sbin/io-audio 10o RECEIVE 1
94228 5 sbin/io-audio 50r INTR
163861 1 bin/zsh 10o REPLY 7
163862 1 bin/zsh 10o REPLY 7
163863 1 bin/login 10o REPLY 7
163864 1 bin/login 10o REPLY 7
745497 1 ton/bin/fontsleuth 6o RECEIVE 1
745497 2 ton/bin/fontsleuth 10o RECEIVE 1
745497 4 ton/bin/fontsleuth 6o RECEIVE 1
745497 5 ton/bin/fontsleuth 6o RECEIVE 1
741402 1 ton/bin/devi-hirun 15o RECEIVE 1
741402 2 ton/bin/devi-hirun 15o REPLY 7
741402 3 ton/bin/devi-hirun 12o SIGWAITINFO
712731 1 on/bin/io-graphics 12r SIGWAITINFO
712731 2 on/bin/io-graphics 10r RECEIVE 1
712731 3 on/bin/io-graphics 12r REPLY 667661
765980 1 usr/photon/bin/pwm 10o RECEIVE 1
811037 1 r/photon/bin/shelf 10o RECEIVE 1
811037 2 r/photon/bin/shelf 10o CONDVAR b8356dac
847902 1 photon/bin/bkgdmgr 10o RECEIVE 1
847903 1 hoton/bin/wmswitch 10o RECEIVE 2
847904 1 r/photon/bin/saver 10o RECEIVE 1
880673 1 bin/zsh 10o SIGSUSPEND
26562594 1 Test/Unix/sntest 63f RECEIVE 1
26562594 2 Test/Unix/sntest 62f REPLY 45067
26562594 3 Test/Unix/sntest 11f REPLY 45067
26562594 4 Test/Unix/sntest 11f REPLY 45067
26562594 5 Test/Unix/sntest 11f REPLY 45067
26562594 6 Test/Unix/sntest 11f CONDVAR 8093dd0
26562594 7 Test/Unix/sntest 10f CONDVAR 8093d68
26566691 1 bin/pidin 10o REPLY 1
17190948 1 r/photon/bin/pterm 10o RECEIVE 1
17190949 1 bin/zsh 10o SIGSUSPEND

David Wolfe <da5id@luvspamwolfe.name> wrote:

Thanks to you and David for your kind replies. > :slight_smile: > I think I’m
starting to see what may be going on here. But I’m still not quite
clear on what the solution ought to be. Given the following threads:

45067 1 sbin/devc-pty 10o RECEIVE 1
.
.
23605283 1 Test/Unix/sntest 63f RECEIVE 1
23605283 2 Test/Unix/sntest 62f REPLY 45067
23605283 3 Test/Unix/sntest 11f REPLY 45067
23605283 4 Test/Unix/sntest 11f REPLY 45067
23605283 5 Test/Unix/sntest 11f REPLY 45067
23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8

how can I figure out who is waiting for what? It sounds like you guys
are saying that devc-pty is having a back-and-forth conversation with
some priority 10 thread, and he can’t wake up until that priority 10
thread has a chance to answer? And my priority 11 thread… hrm–darn
it, I just confused myself again. None of my threads are READY. So how
can they be holding up the show? Isn’t the only way to have a total
deadlock for some ‘greedy’ 10+ priority FIFO thread to to be
completely hogging the CPU?

What could be preventing the messages that my threads have (apparently?)
sent to devc-pty from being processed indefinitely? It can’t be one of
my threads, can it, since they are all blocked? How can I find out
who the ‘guilty party’ in this scenario is?

There is no priority inversion – this goes back to what I originally
posted:

So, this means that likely either:

– the programs are waiting for user input through a pseudo-tty
– the programs are waiting for output to complete, but it hasn’t,
possibly due to someone using ctr-s or something similar to flow-control
the output, or whoever is on the other end of the pseudo-tty not emptying
the buffers and the buffers have filled up.

Do you do printf()s in sntest? Do you ever prompt for keyboard input
in sntest? Those are the most likely things to leave you blocked on
devc-pty.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

There is no priority inversion – this goes back to what I originally
posted:

– the programs are waiting for user input through a pseudo-tty
– the programs are waiting for output to complete, but it hasn’t…

Do you do printf()s in sntest? Do you ever prompt for keyboard input
in sntest? Those are the most likely things to leave you blocked on
devc-pty.

I do quite a few sprintf()'s, but not any printf()'s. And I don’t read
from the terminal anywhere in my code… hmmm–ohmigosh, I’m a big
fat liar. I did define a function called kbhit() to allow me to exit
via a keypress. It looks like:

/***************************************************************
** kbhit() - Detects keypresses; returns 1 if a key has been
** pressed, 0 otherwise…
****************************************************************/
int kbhit( void )
{
char ch;
int nread;

if ( peek_character != -1 )
return 1;

new_settings.c_cc[VMIN] = 0;
tcsetattr( 0, TCSANOW, &new_settings );

nread = read( STDIN_FILENO, &ch, 1 );

new_settings.c_cc[VMIN] = 1;
tcsetattr( 0, TCSANOW, &new_settings );

if ( nread == 1 ) {
peek_character = ch;
return 1;

}

return 0;
}

I used to only call it from a single background task, but I think I’m
calling it from multiple threads now as follows:

// XXX - Temp code for debugging…
if ( kbhit() ) {
getch();
Exec()->Terminate( “Task A kbhit()” );
}

That sure looks like a smoking gun! I’ll bet removing the multiple
calls will fix everything. Though, I must say, I still don’t quite
understand what’s going on. How does calling that function from
multiple threads cause a deadlock? And is it really possible to cause
the same thing to occur just by inserting a few printf()'s? Not
complaining or anything–just earnestly curious about how QNX works.

Thanks much for the help!

David Wolfe wrote:

Though, I must say, I still don’t quite
understand what’s going on. How does calling that function from
multiple threads cause a deadlock?

Simple answer. It doesn’t. Deadlock occurs (in the simplest case)
when 2 threads each lock a resource that won’t be released until
it acquires the resource that the other thread has locked.

You don’t have deadlock (at least not that I can see from the
data you provided).

And is it really possible to cause
the same thing to occur just by inserting a few printf()'s?

It is certainly possible to acquire the mutex for stdin, and
not release it (perhaps because of waiting for a keystroke)
leaving other threads also waiting for the same keystroke
(since they won’t be able to acquire stdin until the thread
that owns stdins’ mutex releases it).

Rennie

Post also fragments of code that open connections and handle condvars. I
must have misssed what your code has to do with devc-pty at all. Do you wait
for input?

“David Wolfe” <da5id@LUVSPAMwolfe.name> wrote in message
news:b8ueei$ke5$1@inn.qnx.com

“Igor Kovalenko” <> kovalenko@attbi.com> > wrote:
Post the whole pidin output. There might be more to this that meet the
eye. And code for sntest would not hurt, if it is small enough.

Output from pidin appended below. The sntest code is too long to post,
but I’ll try to describe what it does. The priority 63 thread is a
‘dispatch thread’ that sits in a loop waiting for a timer pulse to be
delivered:

while( exec->Is_Running() ) {

rcvid = MsgReceive( chid, &msg, sizeof( msg ), NULL );

if( rcvid != 0 ) /* NOT a pulse from the timer–ignore */
continue;

/* Call the Exec dispatcher: */
exec->Dispatch();
}

The Dispatch() function kicks each thread when it is supposed to run
by signalling a condvar. One other thing it does is call recvfrom() on
a non-blocking UDP socket to see if any packets from other machines have
arrived. (Note: for the tests I’ve been doing, there is no ethernet
traffic, so recvfrom() always returns -1/EWOULDBLOCK.)


pid tid name prio STATE Blocked
1 1 6/boot/sys/procnto 0f READY
1 2 6/boot/sys/procnto 10r RECEIVE 1
1 3 6/boot/sys/procnto 10r RECEIVE 1
1 4 6/boot/sys/procnto 10r RUNNING
1 5 6/boot/sys/procnto 63r RECEIVE 1
1 6 6/boot/sys/procnto 10r RECEIVE 1
1 7 6/boot/sys/procnto 10r RECEIVE 1
1 8 6/boot/sys/procnto 10r RECEIVE 1
1 9 6/boot/sys/procnto 6r NANOSLEEP
1 10 6/boot/sys/procnto 12r RECEIVE 1
1 11 6/boot/sys/procnto 10r RECEIVE 1
1 12 6/boot/sys/procnto 63r RECEIVE 1
1 13 6/boot/sys/procnto 10r RECEIVE 1
1 14 6/boot/sys/procnto 10r RECEIVE 1
2 1 sbin/tinit 10o REPLY 1
3 1 proc/boot/slogger 10o RECEIVE 1
12292 1 sbin/mqueue 10o RECEIVE 1
5 1 proc/boot/pci-bios 12o RECEIVE 1
6 1 roc/boot/devb-eide 10o SIGWAITINFO
6 2 roc/boot/devb-eide 21r RECEIVE 1
6 3 roc/boot/devb-eide 21r RECEIVE 4
6 4 roc/boot/devb-eide 10o RECEIVE 10
6 5 roc/boot/devb-eide 10r CONDVAR b822ac20
6 6 roc/boot/devb-eide 63o RECEIVE 7
6 7 roc/boot/devb-eide 63o RECEIVE 7
6 8 roc/boot/devb-eide 10o RECEIVE 7
7 1 /x86/sbin/devc-con 15o RECEIVE 1
8 1 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 2 .3/x86/sbin/fs-pkg 10o SIGWAITINFO
8 3 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 4 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 5 .3/x86/sbin/fs-pkg 10o RECEIVE 1
8 6 .3/x86/sbin/fs-pkg 10o RECEIVE 1
4105 1 sbin/pipe 10o RECEIVE 1
4105 2 sbin/pipe 10o RECEIVE 1
4105 3 sbin/pipe 10o RECEIVE 1
880650 1 r/photon/bin/pterm 10o RECEIVE 1
45067 1 sbin/devc-pty 10o RECEIVE 1
77836 1 sbin/devc-par 10o RECEIVE 1
77836 2 sbin/devc-par 9r CONDVAR 804fa18
667661 1 /photon/bin/Photon 12r RECEIVE 1
77838 1 usr/sbin/spooler 10o NANOSLEEP
77839 1 sbin/io-net 10o SIGWAITINFO
77839 2 sbin/io-net 10o RECEIVE 1
77839 3 sbin/io-net 10o RECEIVE 1
77839 4 sbin/io-net 10o RECEIVE 1
77839 5 sbin/io-net 10o RECEIVE 1
77839 6 sbin/io-net 63o RECEIVE 6
77839 7 sbin/io-net 21r RECEIVE 22
126992 1 sbin/devb-fdc 10o SIGWAITINFO
126992 2 sbin/devb-fdc 21r RECEIVE 1
126992 3 sbin/devb-fdc 10o RECEIVE 7
126992 4 sbin/devb-fdc 10o CONDVAR b822ac20
126992 5 sbin/devb-fdc 10o RECEIVE 4
126992 6 sbin/devb-fdc 10o RECEIVE 4
126992 7 sbin/devb-fdc 10o RECEIVE 4
94225 1 sbin/devc-ser8250 24o RECEIVE 1
135186 1 usr/sbin/random 10o SIGWAITINFO
135186 2 usr/sbin/random 10o RECEIVE 1
135186 3 usr/sbin/random 10o NANOSLEEP
147475 1 usr/sbin/dumper 10o RECEIVE 1
94228 1 sbin/io-audio 10o SIGWAITINFO
94228 2 sbin/io-audio 10o RECEIVE 1
94228 3 sbin/io-audio 10o RECEIVE 1
94228 4 sbin/io-audio 10o RECEIVE 1
94228 5 sbin/io-audio 50r INTR
163861 1 bin/zsh 10o REPLY 7
163862 1 bin/zsh 10o REPLY 7
163863 1 bin/login 10o REPLY 7
163864 1 bin/login 10o REPLY 7
745497 1 ton/bin/fontsleuth 6o RECEIVE 1
745497 2 ton/bin/fontsleuth 10o RECEIVE 1
745497 4 ton/bin/fontsleuth 6o RECEIVE 1
745497 5 ton/bin/fontsleuth 6o RECEIVE 1
741402 1 ton/bin/devi-hirun 15o RECEIVE 1
741402 2 ton/bin/devi-hirun 15o REPLY 7
741402 3 ton/bin/devi-hirun 12o SIGWAITINFO
712731 1 on/bin/io-graphics 12r SIGWAITINFO
712731 2 on/bin/io-graphics 10r RECEIVE 1
712731 3 on/bin/io-graphics 12r REPLY 667661
765980 1 usr/photon/bin/pwm 10o RECEIVE 1
811037 1 r/photon/bin/shelf 10o RECEIVE 1
811037 2 r/photon/bin/shelf 10o CONDVAR b8356dac
847902 1 photon/bin/bkgdmgr 10o RECEIVE 1
847903 1 hoton/bin/wmswitch 10o RECEIVE 2
847904 1 r/photon/bin/saver 10o RECEIVE 1
880673 1 bin/zsh 10o SIGSUSPEND
26562594 1 Test/Unix/sntest 63f RECEIVE 1
26562594 2 Test/Unix/sntest 62f REPLY 45067
26562594 3 Test/Unix/sntest 11f REPLY 45067
26562594 4 Test/Unix/sntest 11f REPLY 45067
26562594 5 Test/Unix/sntest 11f REPLY 45067
26562594 6 Test/Unix/sntest 11f CONDVAR 8093dd0
26562594 7 Test/Unix/sntest 10f CONDVAR 8093d68
26566691 1 bin/pidin 10o REPLY 1
17190948 1 r/photon/bin/pterm 10o RECEIVE 1
17190949 1 bin/zsh 10o SIGSUSPEND
\

“David Gibbs” wrote:

There is no priority inversion…

“Rennie Allen” wrote:

You don’t have deadlock (at least not that I can see from the
data you provided).

:: SIGH :: I think I’m failing to get my point across due to imprecise
usage of terms. Here’s what I don’t understand… with the kbhit()
calls in place, I wind up with this after several minutes:

45067 1 sbin/devc-pty 10o RECEIVE 1
.
.
23605283 1 Test/Unix/sntest 63f RECEIVE 1
23605283 2 Test/Unix/sntest 62f REPLY 45067
23605283 3 Test/Unix/sntest 11f REPLY 45067
23605283 4 Test/Unix/sntest 11f REPLY 45067
23605283 5 Test/Unix/sntest 11f REPLY 45067
23605283 6 Test/Unix/sntest 11f CONDVAR 8092d90
23605283 7 Test/Unix/sntest 10f CONDVAR 8092d28
23605283 8 Test/Unix/sntest 10f CONDVAR 8092df8

My 63f thread (the dispatcher) is still running just fine, as are all
the threads blocked on the CONDVAR that the dispatcher signals. But the
threads where I have a kbhit() call (complete code for which is appended
below) are blocked waiting for devc-pty to reply to them.

I tried to ensure that my kbhit() routine would be non-blocking by
putting the terminal in non-canonical mode and setting c_cc[VTIME] to 0,
a trick I’ve used successfully with monolithic kernels; however, my 62f
thread is somehow managing to block on devc-pty, anyway. And the 10f
threads–and one of the 11f threads–are still iterating normally.
This, to me, is a ‘priority inversion’, because the 10/11f threads are
allowed to run while the 62f thread sits blocked, even though all it did
was a (supposedly) non-blocking read from stdin. It is also a
‘deadlock’ because the situation never resolves itself; all the threads
that are REPLY-blocked on devc-pty remain blocked forever.

Since this problem goes away completely if I refrain from calling my
kbhit() function from multiple threads, I feel there must be a very
simple explanation for this behavior. The kind responses from this
newsgroup have helped me figure out how to eliminate the problem, and
I’m very grateful for that! :slight_smile: But I’m still unsure of the real root
cause… why do these threads block forever on devc-pty?

I understand that the ‘read( STDIN_FILENO, &ch, 1 )’ line in kbhit()
must send a message to devc-pty, but I thought that QNX automatically
bumped a resource manager’s priority to that of the sender; thus, while
servicing the request from my 62f thread, devc-pty’s priority should be
62(??) How, then, does it come to have a priority of 10 when the app
locks up?

\

/* $Id: kbhit.c,v 1.2 2002/11/12 05:25:54 dwolfe Exp $ */

/*********************************************************************
** kbhit.c **
** Fallback implementation of the DOS kbhit() and getch() functions**
*********************************************************************/
#include <stdio.h>
#include <termios.h>
#include <term.h>
#include <unistd.h>
#include “kbhit.h”

static struct termios initial_settings, new_settings;
static int peek_character = -1;


/*********************************************************************
** init_terminal() - Must be called once to set terminal in non- **
** canonical mode… **
/
void init_terminal( void )
{
tcgetattr( 0, &initial_settings );
memcpy( &new_settings, &initial_settings, sizeof( new_settings ) );
new_settings.c_lflag &= ~ICANON;
new_settings.c_lflag &= ~ECHO;
new_settings.c_lflag &= ~ISIG;
new_settings.c_cc[VMIN] = 1;
new_settings.c_cc[VTIME] = 0;
tcsetattr( 0, TCSANOW, &new_settings );
}


/

** reset_terminal() - Called during teardown to return terminal to **
** its previous state **
*/
void reset_terminal( void )
{
tcsetattr( 0, TCSANOW, &initial_settings );
}


/

** kbhit() - Detects keypresses; returns 1 if a key has been **
** pressed, 0 otherwise… **
**********************************************************************/
int kbhit( void )
{
char ch;
int nread;


if ( peek_character != -1 )
return 1;

new_settings.c_cc[VMIN] = 0;
tcsetattr( 0, TCSANOW, &new_settings );

nread = read( STDIN_FILENO, &ch, 1 );

new_settings.c_cc[VMIN] = 1;
tcsetattr( 0, TCSANOW, &new_settings );

if ( nread == 1 ) {
peek_character = ch;
return 1;

}

return 0;
}


/*********************************************************************
** getch() - Gets a character from standard input; most often used **
** after kbhit() has detected a keypress… **
*********************************************************************/
int getch( void )
{
char ch;

if ( peek_character != -1 ) {

ch = peek_character;
peek_character = -1;
return ch;
}

read( STDIN_FILENO, &ch, 1 );
return ch;
}

“David Wolfe” <da5id@LUVSPAMwolfe.name> wrote in message
news:b94qo1$mfg$1@inn.qnx.com

“David Gibbs” wrote:
There is no priority inversion…


I understand that the ‘read( STDIN_FILENO, &ch, 1 )’ line in kbhit()
must send a message to devc-pty, but I thought that QNX automatically
bumped a resource manager’s priority to that of the sender; thus, while
servicing the request from my 62f thread, devc-pty’s priority should be
62(??) How, then, does it come to have a priority of 10 when the app
locks up?

I don’t think devc-pty has floating priority at all. A simple test with
gets() at high priority does not show devc-pty bumped. Priority inheritance
on channels can be suppressed by a flag, which some resmgrs do for a reason
that I will leave for QNX to comment on.

Another possibility is that if a request can’t be answered right away, it
could be put into ‘pending reply’ queue and then resmgr would wait for
messages from other clients since it got nothing to do in the meantime…
The next message from a low priority client would then (I speculate, someone
correct me if I am wrong) drop its priority back. This can only be avoided
by multi-threaded resmgrs, which devc-pty is not.

One way or another, I don’t think it is relevant. There’s got to be a reason
why your non-blocking request can’t be replied right away in the first
place. I would not be terribly surprized if it was a bug on QNX side. I have
seen various resmgrs block their clients indefinitely. I have also seen some
of QNX resmgrs getting into deadlock themselves. The devc-pty in particular
was notoriously buggy in the past.

– igor

int kbhit( void )
{
char ch;
int nread;


if ( peek_character != -1 )
return 1;

new_settings.c_cc[VMIN] = 0;
tcsetattr( 0, TCSANOW, &new_settings );

nread = read( STDIN_FILENO, &ch, 1 );

new_settings.c_cc[VMIN] = 1;
tcsetattr( 0, TCSANOW, &new_settings );

if ( nread == 1 ) {
peek_character = ch;
return 1;

}

return 0;
}

This code isn’t thread safe. First, you are modifying the attributes on a
global file descriptor. Second, you are modifying the attribute structure
based on a global variable. What I suspect is happening is that you are
getting a race where your VMIN setting is getting set back to 1 before
you call read(). Once you have called read() with that setting you won’t
be able to change it back until the read() waiting with a VMIN of 1 finishes.

You need to add in a mutex and lock it down around the kbit() and getch()
calls. However, if all your threads are calling getch() you will still be
able to lockup your kbhit() call, an exersise I will leave to the reader. :wink:

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

“Chris McKillop” <cdm@qnx.com> wrote:


new_settings.c_cc[VMIN] = 0;
tcsetattr( 0, TCSANOW, &new_settings );

nread = read( STDIN_FILENO, &ch, 1 );

new_settings.c_cc[VMIN] = 1;
tcsetattr( 0, TCSANOW, &new_settings );

This code isn’t thread safe. First, you are modifying the attributes
on a global file descriptor. Second, you are modifying the attribute
structure based on a global variable.

D’oh! You’re right, of course! :sunglasses:

What I suspect is happening is
that you are getting a race where your VMIN setting is getting set
back to 1 before you call read(). Once you have called read() with
that setting you won’t be able to change it back until the read()
waiting with a VMIN of 1 finishes.

Thanks for the astute analysis; your explanation makes perfect sense.

You need to add in a mutex and lock it down around the kbit() and
getch() calls. However, if all your threads are calling getch() you
will still be able to lockup your kbhit() call, an exersise I will
leave to the reader. > :wink:

Heh. I’ll may just be crazy enough to try it. Even though I don’t
intend to call kbhit() from multiple threads any longer, I’d like to try
a mutex-protected version just to get a warm fuzzy feeling that it does,
indeed, fix the problem. I was thinking there was some kind of heavy
black magic going on, but… :: SIGH :: … once again, it was just me
abusing multithreaded code. :-s Thanks to all for help in getting me
back on track…

What is interesting though, nobody commented on the priority inversion part.
It appears that my assumption was right and priority inheritance protocol on
message passing has a flaw when messages can’t be replied right away (which
could be quite often). Even multithreaded resmgrs don’t solve the problem
completely, unless they create a new thread for every pending request, which
is not the case with QNX ‘dispatch’ architecture that utilizes thread pool
(so there’s no direct mapping of requests to threads).

This issue can’t be solved completely inside a resmgr by remembering
priorities with the requests and bumping your own priority when you’re ready
to serve them. Trouble is, to bump your own priority you need to be running
and you might be preempted by a higher priority thread, which is still lower
than the one you’d bump to if you had a chance.

The only way to solve this for good is for kernel to ‘remember’ that a
thread’s priority was bumped as a result of inheritance and inhibit
inheritance of lower-priority messages until higher-priority pending
requests have been replied to. Implementation can be quite simple. The
MsgReply() call could ‘atomically’ update a new ‘priority floor’ channel
attribute with a value determined by the resmgr, based on its queue of
pending requests (tell kernel to not drop my priority lower than this). The
only trouble is, MsgReply() does not have such an argument. But it could be
added, right? And it could be optional, for backwards compatibility. Or it
could be a new kernel call… comments QNX?

– igor

“David Wolfe” <da5id@LUVSPAMwolfe.name> wrote in message
news:b977l8$fb5$1@inn.qnx.com

“Chris McKillop” <> cdm@qnx.com> > wrote:

new_settings.c_cc[VMIN] = 0;
tcsetattr( 0, TCSANOW, &new_settings );

nread = read( STDIN_FILENO, &ch, 1 );

new_settings.c_cc[VMIN] = 1;
tcsetattr( 0, TCSANOW, &new_settings );

This code isn’t thread safe. First, you are modifying the attributes
on a global file descriptor. Second, you are modifying the attribute
structure based on a global variable.

D’oh! You’re right, of course! > :sunglasses:

What I suspect is happening is
that you are getting a race where your VMIN setting is getting set
back to 1 before you call read(). Once you have called read() with
that setting you won’t be able to change it back until the read()
waiting with a VMIN of 1 finishes.

Thanks for the astute analysis; your explanation makes perfect sense.

You need to add in a mutex and lock it down around the kbit() and
getch() calls. However, if all your threads are calling getch() you
will still be able to lockup your kbhit() call, an exersise I will
leave to the reader. > :wink:

Heh. I’ll may just be crazy enough to try it. Even though I don’t
intend to call kbhit() from multiple threads any longer, I’d like to try
a mutex-protected version just to get a warm fuzzy feeling that it does,
indeed, fix the problem. I was thinking there was some kind of heavy
black magic going on, but… :: SIGH :: … once again, it was just me
abusing multithreaded code. :-s Thanks to all for help in getting me
back on track…

David Wolfe wrote:

“Chris McKillop” <> cdm@qnx.com> > wrote:

new_settings.c_cc[VMIN] = 0;
tcsetattr( 0, TCSANOW, &new_settings );

nread = read( STDIN_FILENO, &ch, 1 );

new_settings.c_cc[VMIN] = 1;
tcsetattr( 0, TCSANOW, &new_settings );

This code isn’t thread safe. First, you are modifying the attributes
on a global file descriptor. Second, you are modifying the attribute
structure based on a global variable.

D’oh! You’re right, of course! > :sunglasses:

What I suspect is happening is
that you are getting a race where your VMIN setting is getting set
back to 1 before you call read(). Once you have called read() with
that setting you won’t be able to change it back until the read()
waiting with a VMIN of 1 finishes.

Thanks for the astute analysis; your explanation makes perfect sense.

You need to add in a mutex and lock it down around the kbit() and
getch() calls. However, if all your threads are calling getch() you
will still be able to lockup your kbhit() call, an exersise I will
leave to the reader. > :wink:

Heh. I’ll may just be crazy enough to try it. Even though I don’t
intend to call kbhit() from multiple threads any longer, I’d like to try
a mutex-protected version just to get a warm fuzzy feeling that it does,
indeed, fix the problem. I was thinking there was some kind of heavy
black magic going on, but… :: SIGH :: … once again, it was just me
abusing multithreaded code. :-s Thanks to all for help in getting me
back on track…

Well, if C gives you enough rope to … then threads give you the noose
as a built-in!
It is so easy to not notice a global var in a function called from
multiple threads.
I’m beginning to prefer lotsa tiny tasks and message passing, just like
QNX4… at least
it’s obvious when I’m using shared memory.

Phil