I’ve encountered a strange deadlock in the application.
After looking closer I’ve noticed that a thread holding a mutex is blocked
on a MsgSendPulse_r() when sending a pulse to a channel on another node. It
had been blocked for more than 12 hours, though documentation says it a
non-blocking call.
When I tried to cut a coredump of io-net, the node went down.
Please look at information below:
- Thread #5 is blocked with state NET_REPLY.
- It sends a pulse to fd #10, which points to process #94228 on node #48,
thoughpidin net
shows there is no such node. Why don’t we get an error,
ESRCH, in this case? -
sin -p 3018784 fds
cannot extract any information aboyt fd #10 (which
is not strange, because it asks the server to provide this information, and
the server no longer exists)
Can you explain why the connection was not closed and why no error code was
returned and the thread was blocked instead?
This all applies to QNX 6.3, QNet lite.
I will try to reproduce the situation.
Thank you,
Roman
\
pidin -p 3018784
pid tid name prio STATE Blocked
3018784 1 …/bin/player.bin 10o CONDVAR 816eb54
3018784 2 …/bin/player.bin 10o CONDVAR 8116da4
3018784 3 …/bin/player.bin 5o READY
3018784 4 …/bin/player.bin 10o CONDVAR 811d084
3018784 5 …/bin/player.bin 50o NET_REPLY
3018784 6 …/bin/player.bin 21o RECEIVE 3
3018784 7 …/bin/player.bin 10o CONDVAR 816eb8c
3018784 8 …/bin/player.bin 10o RECEIVE 12
3018784 9 …/bin/player.bin 10o CONDVAR 816e764
3018784 10 …/bin/player.bin 10o CONDVAR 816e844
3018784 11 …/bin/player.bin 10o MUTEX 3018784-14 #1
3018784 12 …/bin/player.bin 10o CONDVAR 816ea3c
3018784 13 …/bin/player.bin 10o CONDVAR 818d9b4
3018784 14 …/bin/player.bin 10o MUTEX 3018784-05 #1
3018784 15 …/bin/player.bin 10o CONDVAR 8189ce4
3018784 16 …/bin/player.bin 10o CONDVAR 819500c
3018784 17 …/bin/player.bin 10o MUTEX 3018784-14 #1
3018784 18 …/bin/player.bin 10o CONDVAR 818118c
3018784 19 …/bin/player.bin 10o MUTEX 3018784-14 #1
================================================================
$2 = {client = {fd = 10,
handler = 0x80a08fc <CAsyncEntry::AsyncIOHandler(int, ECALLBACKTYPE,
void *)>, ct = CALLBACKTYPE_NONE, param = 0x818b888},
server = {nd = 48, pid = 94228}}
================================================================
sin -p 3018784 fds
player.bin 3018784 776K 628K 124K 2894K 618626
0 167945 WR 0/0 /net/dart.eleks.com/dev/ttyp4
1 167945 WR 0/0 /net/dart.eleks.com/dev/ttyp4
2 167945 WR 0/0 /net/dart.eleks.com/dev/ttyp4
3 3018783 W_ 0/0 /zzzzz/dev/play.100/data/Global/
4 3018783 R 0/40 /zzzzz/dev/play.100/data/Command/
5 3018783 W 0/0 /zzzzz/dev/play.100/data/Command/
6 3018782 R 0/0 /zzzzz/dev/play.100/etc/
7 3018783 W 0/0 /zzzzz/dev/play.100/data/sub_1/
8 3018783 _R 0/0 /zzzzz/dev/play.100/data/sub_1/
9 3018786 R 0/0 /zzzzz/dev/axis.100/data/Monitor/
10 94228
11 3018783 W 0/0 /zzzzz/dev/play.100/data/sub_2/
12 3018783 R 0/0 /zzzzz/dev/play.100/data/sub_2/
13 3018783 W 0/0 /zzzzz/dev/play.100/data/sub_3/
14 3018783 R 0/0 /zzzzz/dev/play.100/data/sub_3/
15 3018789 R 0/0 /zzzzz/dev/port.100/data/Monitor/
16 3018783 W 0/0 /zzzzz/dev/play.100/data/sub_4/
17 3018783 R 0/0 /zzzzz/dev/play.100/data/sub_4/
18 3018783 W 0/0 /zzzzz/dev/play.100/data/sub_5/
19 3018783 R 0/0 /zzzzz/dev/play.100/data/sub_5/
20 3018783 W 0/0 /zzzzz/dev/play.100/data/Monitor/
21 3018786 W 0/0 /zzzzz/dev/axis.100/data/Command/
22 3018786 _R 0/0 /zzzzz/dev/axis.100/data/Monitor/
0s 1
1s 69642
4s 1 __ 0/-1 /zzzzz/mon/procinfo/data/3018784/procinfo
5s 1 __ 0/-1
/zzzzz/mon/procinfo/data/play.100/PLAYER/procinfo
6s 1 __ 0/-1
/zzzzz/mon/procinfo/data/3018784/threadinfo
7s 1 __ 0/-1
/zzzzz/mon/procinfo/data/play.100/PLAYER/threadinfo
8s 1 __ 0/-1
/zzzzz/mon/procinfo/data/3018784/threadtextinfo
9s 1 __ 0/-1
/zzzzz/mon/procinfo/data/play.100/PLAYER/threadtextinfo
10s 1 __ 0/-1 /zzzzz/mon/procinfo/data/3018784/logdata
11s 1 __ 0/-1
/zzzzz/mon/procinfo/data/play.100/PLAYER/logdata
13s 1 __ 0/-1 /zzzzz/dev/play.100/profiles/
==============================================================
pidin net
ND Node CPU Release FreeMem BootTime
0 console 1 X86 6.3.0 26Mb/123Mb May 18 12:48:26 EEST
2004
1 axis_sew 1 X86 6.3.0 92Mb/119Mb Jun 29 01:25:32 EEST
2004
2 axis_hitachi 1 X86 6.3.0 93Mb/119Mb Jun 29 01:26:33 EEST
2004
12 dart 1 X86 6.3.0 986Mb/1022Mb Jun 25 15:28:48 EEST
2004
34 test_srv 1 X86 6.3.0 86Mb/254Mb Jun 28 21:04:12 EEST
2004
477 axis_test_1 1 X86 6.3.0 95Mb/119Mb Jun 28 03:12:05 EEST
2004
497 axis_test_2 1 X86 6.3.0 89Mb/119Mb Jun 28 03:10:16 EEST
2004
==============================================================