Mario,
Your gut feeling is a good one 
I ran and got “hung” events. One on my development server (no hardware) and
one on the actual instrument. Posted here below in the debugging information
I got on the development server case after attaching gdb to the task when it
reached the “hung” state.
Here’s what I see using pidin at the moment I attach gdb. Thread 9 (our
watchdog and highest priority thread) has been consuming all the CPU
(3m55s worth) from the time it hung till I attached the debugger.
$ pidin -pspm -l tt
pid tid name STATE start time thread start sutime
376356923 1 ./spm CONDVAR Mar 31 01:37 Mar 31 01:37 0.180
376356923 2 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 3 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 4 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 5 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 6 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 7 ./spm SEM Mar 31 01:37 Mar 31 01:37 0.000
376356923 8 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.001
376356923 9 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 3m55s
376356923 10 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 1.403
376356923 11 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 12 ./spm CONDVAR Mar 31 01:37 Mar 31 01:37 0.000
376356923 13 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 14 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 15 ./spm CONDVAR Mar 31 01:37 Mar 31 01:37 0.000
376356923 16 ./spm CONDVAR Mar 31 01:37 Mar 31 01:37 0.000
376356923 17 ./spm CONDVAR Mar 31 01:37 Mar 31 01:37 0.000
376356923 18 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 19 ./spm STOPPED Mar 31 01:37 Mar 31 01:37 0.000
376356923 20 ./spm CONDVAR Mar 31 01:37 Mar 31 01:37 0.000
376356923 21 ./spm CONDVAR Mar 31 01:37 Mar 31 01:37 0.000
Here’s what I see in terms of linked libraries when I start gdb along with
my arguments to gdb.
$ gdb --pid=376356923 --se=/home/txs/projects/QNX-6.3-Dev0001/SPM/spm
GNU gdb 5.2.1qnx-nto
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “ntox86”…
Attaching to program /home/txs/projects/QNX-6.3-Dev0001/SPM/spm’, process
376356923
Reading symbols from
/usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5…done.
Loaded symbols for /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
Reading symbols from /usr/qnx630/target/qnx6/x86/lib/libm.so.2…done.
Loaded symbols for /usr/qnx630/target/qnx6/x86/lib/libm.so.2
Reading symbols from /usr/qnx630/target/qnx6/x86/lib/libc.so.2…done.
Loaded symbols for /usr/qnx630/target/qnx6/x86/lib/libc.so.2
[Switching to process 376356923]
0xb032d7e9 in SyncCondvarWait_r () from
/usr/qnx630/target/qnx6/x86/lib/libc.so.2
Here’s what I see when I ask gdb about the state of the threads in the
system. Basically everything looks normal for every thread except 8, 9 and
possibly 10 (which may have legitimately been interrupted)…
(gdb) info threads
21 process 376356923 0xb032d7e9 in SyncCondvarWait_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
20 process 376356923 0xb032d7e9 in SyncCondvarWait_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
19 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
18 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
17 process 376356923 0xb032d7e9 in SyncCondvarWait_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
16 process 376356923 0xb032d7e9 inSyncCondvarWait_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
15 process 376356923 0xb032d7e9 in SyncCondvarWait_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
14 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
13 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
12 process 376356923 0xb032d7e9 in SyncCondvarWait_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
11 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
10 process 376356923 0xb8249aea in std::string::_Rep::_M_refcopy() ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
- 9 process 376356923 0xb823150e in std::locale::locale(std::locale const&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
8 process 376356923 0xb823150e in std::locale::locale(std::locale const&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
7 process 376356923 0xb032db0d in SyncSemWait ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
6 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
5 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
4 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
3 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
2 process 376356923 0xb032c986 in MsgReceive_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
1 process 376356923 0xb032d7e9 in SyncCondvarWait_r ()
from /usr/qnx630/target/qnx6/x86/lib/libc.so.2
So stepping to thread 9 and dumping a stack trace gives the following
output. What looks suspicious is the call stack into overflow(),
_M_really_overflow(), _M_convert_to_external etc. It appears something has
overflowed internally someplace at this point. Where/how I have no idea.
(gdb) thread 9
(gdb) where
#0 0xb823150e in std::locale::locale(std::locale const&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#1 0xb822a920 in std::basic_filebuf<char, std::char_traits
>::_M_convert_to_external(char*, int, int&, int&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#2 0xb822a871 in std::basic_filebuf<char, std::char_traits
>::_M_really_overflow(int) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#3 0xb822a7ac in std::basic_filebuf<char, std::char_traits
>::overflow(int) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#4 0xb8249456 in std::basic_streambuf<char, std::char_traits
>::xsputn(char const*, int) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#5 0xb822af58 in std::basic_filebuf<char, std::char_traits
>::xsputn(char const*, int) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#6 0xb824450b in std::ostream::write(char const*, int) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#7 0xb8244bac in std::basic_ostream<char, std::char_traits >&
std::operator<< <std::char_traits >(std::basic_ostream<char,
std::char_traits >&, char const*) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#8 0x084fa09a in Watchdog::WatchdogTimer::handleTimerExpiry()
(this=0x86b80c0) at WatchdogTimer.cpp:92
#9 0x084f1ba1 in Thread::receiveMessage(unsigned&, void**, unsigned&,
bool&) (this=0x889c1d8, id=@0x7e06f30, msgPointer=0x7e06f2c,
byteSize=@0x7e06f28, shutdown=@0x7e06f27)
at Thread.cpp:795
#10 0x084f0249 in Thread::defaultMainLoop(void*) (theThread=0x889c1d8)
at Thread.cpp:273
#11 0x084f0008 in Thread::threadMain(void*) (theThread=0x889c1d8) at
Thread.cpp:231
The actual last line of code in our application is at step 8. That line is:
std::cout << "TDS: WatchdogTimer::handleTimerExpiry() timer expired
on count " << count << " reset to time " << m_period.tv_sec << " " <<
m_period.tv_nsec << std::endl;
where count is defined to be: static int count = m_frequency;
where m_period is defined in the class definition to be: timespec m_period;
I tried to dump the values of m_period out but gdb complains about not be
able to retrieve the variable name. Not sure why this is. But suffice a
common timespec variable can’t be that crazy that cout can’t handle it.(gdb)
(gdb) print count
$1 = 1
So moving on to thread 8 (which is stopped at the exact same point) and
dumping a stack trace gives the following output.
(gdb) thread 8
(gdb) where
#0 0xb823150e in std::locale::locale(std::locale const&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#1 0xb822a920 in std::basic_filebuf<char, std::char_traits
>::_M_convert_to_external(char*, int, int&, int&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#2 0xb822a8e3 in std::basic_filebuf<char, std::char_traits
>::_M_really_overflow(int) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#3 0xb822ae20 in std::basic_filebuf<char, std::char_traits
>::sync() ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#4 0xb824459f in std::ostream::flush() () from
/usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#5 0xb8244933 in std::basic_ostream<char, std::char_traits >&
std::flush<char, std::char_traits >
(std::basic_ostream<char, std::char_traits >&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#6 0xb82448ec in std::basic_ostream<char, std::char_traits >&
std::endl<char, std::char_traits >
(std::basic_ostream<char, std::char_traits >&) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#7 0xb8243453 in std::ostream::operator<<(std::ostream&
()(std::ostream&)) ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#8 0x084ac35e in
DebugReporter::DebugOutputStream::outWithEndl(std::string const&) (
this=0x86b4f20, s=@0x8a38ce8) at DebugReporter.cpp:287
#9 0x084aebd1 in DebugReporter::Loop() (this=0x86b4f20) at
DebugReporter.cpp:886
#10 0x084aec88 in DebugReporter::LoopStarter(void) (pObject=0x86b4f20) at
DebugReporter.cpp:911
The actual last line of code in our application is at step 8. That line is:
m_outputFileStream << s << endl;
where m_outputFileStream is defined to be: std::ofstream m_outputFileStream;
and s is passed into the function as (const std::string& s)
(gdb) print s
$8 = (const string &) @0x8a38ce8: {static npos = 0,
_M_dataplus = {<allocator> = {},
_M_p = 0x8a34cb4 ““9748”,“Info”,“HW_Config:: Short or no apparent
comment for ID 1074254011 in Device187”,”},
static _S_empty_rep_storage = {0, 0, 0, 0}}
This value for ‘s’ looks legitimate as we have a statement that builds this
text string elsewhere in the code.
Lastly we come to thread 10 which likely was just interrupted in the middle
of it’s normal operation as there is nothing suspicious about this stack
trace.
(gdb) thread
(gdb) where
#0 0xb8249aea in std::string::_Rep::_M_refcopy() ()
from /usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#1 0xb8249b7b in std::string::string() () from
/usr/qnx630/target/qnx6/x86/lib/libstdc++.so.5.0.5
#2 0x08291aa9 in Hardware::DeviceBase::RestoreHwConfig() (this=0x89cff70)
at DeviceBase.cpp:1536
#3 0x0828cde0 in
Hardware::DeviceBase::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89cff70, method=SEARCH_RECURSIVE) at DeviceBase.cpp:119
#4 0x0829ee19 in
Hardware::DigitalInput::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89cff70, method=SEARCH_RECURSIVE) at DigitalInput.cpp:47
#5 0x083769a9 in
Hardware::StepperCardSPI_DigitalInput::InitialiseDeviceData(
Hardware::SEARCH_METHOD) (this=0x89cff70, method=SEARCH_RECURSIVE) at
StepperCardSPI_DigitalInput.cpp:55
#6 0x0828cdbb in
Hardware::DeviceBase::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89cfbf0, method=SEARCH_RECURSIVE) at DeviceBase.cpp:114
#7 0x0828cdbb in
Hardware::DeviceBase::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89cf880, method=SEARCH_RECURSIVE) at DeviceBase.cpp:114
#8 0x082abae9 in
Hardware::FluidSupplyBase::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89cf880, method=SEARCH_RECURSIVE) at FluidSupplyBase.cpp:113
#9 0x082b0db8 in
Hardware::FluidSupplySheath::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89cf880, method=SEARCH_RECURSIVE) at FluidSupplySheath.cpp:77
#10 0x0828cdbb in
Hardware::DeviceBase::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89ca110, method=SEARCH_RECURSIVE) at DeviceBase.cpp:114
#11 0x0828cdbb in
Hardware::DeviceBase::InitialiseDeviceData(Hardware::SEARCH_METHOD) (
this=0x89b1018, method=SEARCH_RECURSIVE) at DeviceBase.cpp:114
#12 0x08161513 in
Hardware::SPM_Hardware::initialiseSPM_Data(CallBackBase<HW_Message>) (
this=0x89b1018) at SPM_Hardware.cpp:1905
#13 0x08174006 in Hardware::ActionP0Hardware::SPM_Hardware::Run()
(this=0x89e1fbc) at Action.h:181
#14 0x082a1273 in Hardware::DispatchQueue::Loop() (this=0x87da7b0) at
DispatchQueue.cpp:332
#15 0x082a1362 in Hardware::DispatchQueue::LoopStarter(void)
(pObject=0x87da7b0) at DispatchQueue.cpp:390
The actual last line of code in our application is at step 8. That line is:
string comment;
(gdb) print comment
$9 = {static npos = 0, _M_dataplus = {<allocator> = {},
_M_p = 0x874076c “/home/txs/projects/QNX-6.3-Dev0001//SPM/Hardware/Config/Dev_0223.cfg”},
static _S_empty_rep_storage = {0, 0, 0, 0}}
This appears to have some left over garbage left on the stack since this
line is really just creating storage for a string that’s about to be used.
Not sure why gdb reports values here…
So that’s where I stand. Definitely not in MsgReceive_r(). But obviously
hung up on some kind of spin lock issue in the libraries someplace. Esp
given that 2 threads are in the EXACT same place at the same time looks
really suspicious. Almost like the lower priority thread got interrupted and
can’t finish to give the higher priority thread the spin lock and so we sit
here for ever never making any progress…
So looking at our compilation/link flags I have the following (same ones
used for 2.95.3 compiler that we are moving from). Given that everything works
in 2.95.3 I am posting these incase I am missing something to handle thread
safety in the std libs (_PTHREADS=1 worked in 2.95.3).
CC = qcc -V3.3.1,gcc_ntox86_gpp
CXX = qcc -V3.3.1,gcc_ntox86_gpp
LD = qcc -V3.3.1,gcc_ntox86_gpp
CFLAGS = -lang-c++ -D_PTHREADS=1 -g -Werror -Wno-deprecated
LDFLAGS = -lang-c++ -g -Wl,-Map spm.map -D_PTHREADS=1
So now for my questions - you knew there had to be some ![Wink ;-)]()
-
Am I using the wrong flags/libraries to get threaded support for standard
library calls in 3.3.1?
-
If I am using the correct flags/libraries then I have a huge question on
why 2 threads are spin locking in what looks to be a major error path in the
handling of simple valid strings. I am wondering how I can get there or figure
out what’s wrong once I leave my code because I can’t get ANY values in the
debugger for anything in the libraries (no info locals, no info args etc).
TIA,
Tim