C++ STL string/basic_string allocation bug and/or incompatib

I’ve been battling with a very unusual bug for over a day now and I’m really
scratching my head now and was wondering if anyone else has come across it.

What I have is a set of processes running under QNXRTP which are built on
top of an in house framework. The framework is POSIX based and makes heavy
use of the C++ STL. Most of my processes are single threaded. A couple use
multiple threads. One of the processes implements using multiple threads is
an event service which receives events from clients in the system and then
publishes them on to any interested subscribers. This has been implemented
as two threads with one thread (the main process) receiving the publish
requests which are then queued to the second thread via an STL list which
then publishes the event to subscribers independently of the
receiving thread. The threads are all mutexed where required and use a
condition variable to control the producer/consumer relationship on the
list.

This has been working fine for a while but recently we have been using the
system in a real world role under which it experiences a much higher event
load than we have been seeing so far. Under this load it exhibits a very
strange bug i.e. periodically the process will crash, in the second thread,
inside the allocator code for the STL string class. Usually this will be due
to either it writing to an address outside the heap, or alternatively via a
null pointer. In all cases it seems to be as a result of an access to the
underlying string in a functional call as a parameter (typically using the
c_str() method).

As a sanity check I ported the code to Solaris and run it with Rational’s
Purify tool which did not reveal any memory problems but proceeded to crash
in exactly the same way!

I have tried increasing the stack size in the pthread_create call but this
seems to have no effect.

I am using G++ 2.95.2, QNXRTP and Solaris 2.6.

Anybody seen this crash before or know of any potential pitfalls using the
STL in a multithreaded environment that I might be unaware of?

Thanks,

James

james@socrates.demon.co.uk

Previously, James Bridson wrote in qdn.public.qnxrtp.devtools:

I’ve been battling with a very unusual bug for over a day now and I’m really
scratching my head now and was wondering if anyone else has come across it.

What I have is a set of processes running under QNXRTP which are built on
top of an in house framework. The framework is POSIX based and makes heavy
use of the C++ STL. Most of my processes are single threaded. A couple use
multiple threads. One of the processes implements using multiple threads is
an event service which receives events from clients in the system and then
publishes them on to any interested subscribers. This has been implemented
as two threads with one thread (the main process) receiving the publish
requests which are then queued to the second thread via an STL list which
then publishes the event to subscribers independently of the
receiving thread. The threads are all mutexed where required and use a
condition variable to control the producer/consumer relationship on the
list.

This has been working fine for a while but recently we have been using the
system in a real world role under which it experiences a much higher event
load than we have been seeing so far. Under this load it exhibits a very
strange bug i.e. periodically the process will crash, in the second thread,
inside the allocator code for the STL string class. Usually this will be due
to either it writing to an address outside the heap, or alternatively via a
null pointer. In all cases it seems to be as a result of an access to the
underlying string in a functional call as a parameter (typically using the
c_str() method).

As a sanity check I ported the code to Solaris and run it with Rational’s
Purify tool which did not reveal any memory problems but proceeded to crash
in exactly the same way!

I have tried increasing the stack size in the pthread_create call but this
seems to have no effect.

I am using G++ 2.95.2, QNXRTP and Solaris 2.6.

Anybody seen this crash before or know of any potential pitfalls using the
STL in a multithreaded environment that I might be unaware of?

I don’t know the G++ 2.95.2 STL very well, but it is based on an old version of SGI’s STL. My guess is that it uses a COW (copy-on-write) implementation of string, which will be thread unsafe. The way to get the actual copy to occur is to call a non-const string member function just after the copy (while still in the mutexed code). e.g. something like

string a = “Boo”;

//lock mutex
//b is a string used in another thread.
b = a;
//now a and b share representation.
if (!b.size() == 0)
b[0]; //force a copy by calling non-const operator[]
//unlock.


In many cases it is possible to turn COW off. Have a look in the header, and there may be a #define or a constant you can change to switch it off. They may also know in the gcc newsgroups.

But there are two alternative approaches, which may make sense if you are making extensive use of STL. The first is to download STLport from www.stlport.org. The latest beta is the thing to get. Unfortunately, the gcc makefile doesn’t work straight away - you need to set up a link from c++ to g++ (for some reason it calls c++ not g++). Additionally there are a number of patches that need to be made to the headers to configure it for QNX use. If you go this route, I’ll post the changes required to get it working…

Finally, QSSL are soon to release at port of the Dinkumware C++ library, which is a completely ISO compliant C++ standard library (which includes the STL obviously, along with a C standard library, iostreams, etc). If you contact QSSL it may be possible to get on the beta testing program to save you from having to wait until the end of the year when I believe it is to be released.

Tom

“Tom” <the_wid@my-deja.com> wrote in message
news:Voyager.001120170548.8581156B@administrator.co.uk

But there are two alternative approaches, which may make sense if you are
making extensive use of STL. The first is to download STLport from

www.stlport.org. The latest beta is the thing to get. Unfortunately, the gcc
makefile doesn’t work straight away - you need to set up a link from c++ to
g++ (for some reason it calls c++ not g++). Additionally there are a number
of patches that need to be made to the headers to configure it for QNX use.
If you go this route, I’ll post the changes required to get it working…

Tom,

I would be interested in exploring this route and would appreciate a copy of
any patches you may have for use with QNX.

James

Previously, James Bridson wrote in qdn.public.qnxrtp.devtools:

“Tom” <> the_wid@my-deja.com> > wrote in message
news:> Voyager.001120170548.8581156B@administrator.co.uk> …
But there are two alternative approaches, which may make sense if you are
making extensive use of STL. The first is to download STLport from
www.stlport.org> . The latest beta is the thing to get. Unfortunately, the gcc
makefile doesn’t work straight away - you need to set up a link from c++ to
g++ (for some reason it calls c++ not g++). Additionally there are a number
of patches that need to be made to the headers to configure it for QNX use.
If you go this route, I’ll post the changes required to get it working…

Tom,

I would be interested in exploring this route and would appreciate a copy of
any patches you may have for use with QNX.

James

Download and unpack 4.1b3 from www.stlport.org. This is what I did the patches for. This is the list of changes I had to make:

Ensure that g++ can be invoked via “c++”

1: FILE struct helpers not defined for QNX. This is required for efficient versions of cin and cout when they are synced with C stdin and stdout, as they are by default.

stlport/stl/_stdio_file.h
line: 153 after,

inline void _FILE_I_set(FILE& __f, char* __begin, char* __next, char* __end) {
__f._bf._base = (unsigned char*) __begin;
__f._p = (unsigned char*) __next;
__f._bf._size = __end - __next;
}

define __STL_FILE_I_O_IDENTICAL

–insert start–
#elif defined(QNX)

inline int _FILE_fd(const FILE& __f) { return __f._handle; }
inline char* _FILE_I_begin(const FILE& __f) { return (char*) __f._base; }
inline char* _FILE_I_next(const FILE& __f) { return (char*) __f._ptr; }
inline char* _FILE_I_end(const FILE& __f)
{ return (char*) __f._ptr + __f._cnt; }

inline ptrdiff_t _FILE_I_avail(const FILE& __f) { return __f._cnt; }

inline char& _FILE_I_preincr(FILE& __f)
{ --__f._cnt; return (char) (++__f._ptr); }
inline char& _FILE_I_postincr(FILE& __f)
{ --__f._cnt; return (char) (__f._ptr++); }
inline char& _FILE_I_predecr(FILE& __f)
{ ++__f._cnt; return (char) (–__f._ptr); }
inline char& _FILE_I_postdecr(FILE& __f)
{ ++__f._cnt; return (char) (__f._ptr–); }
inline void _FILE_I_bump(FILE& __f, int __n)
{ __f._ptr += __n; __f._cnt -= __n; }

inline void _FILE_I_set(FILE& __f, char* __begin, char* __next, char* __end) {
__f._base = (unsigned char*) __begin;
__f._ptr = (unsigned char*) __next;
__f._cnt = __end - __next;
}

define __STL_FILE_I_O_IDENTICAL

–insert end–

2: Character type constants not defined for QNX

In file stlport/stl/c_locale.h
line 176, after:

elif defined (FreeBSD)

define _Locale_CNTRL _CTYPE_C

define _Locale_UPPER _CTYPE_U

define _Locale_LOWER _CTYPE_L

define _Locale_DIGIT _CTYPE_D

define _Locale_XDIGIT _CTYPE_X

define _Locale_PUNCT _CTYPE_P

define _Locale_SPACE _CTYPE_S

define _Locale_PRINT _CTYPE_R

define _Locale_ALPHA _CTYPE_A

–insert start–

elif defined (QNX)

define _Locale_CNTRL _CNTRL

define _Locale_UPPER _UPPER

define _Locale_LOWER _LOWER

define _Locale_DIGIT _DIGIT

define _Locale_XDIGIT _XDIGT

define _Locale_PUNCT _PUNCT

define _Locale_SPACE _SPACE

define _Locale_PRINT _PRINT

define _Locale_ALPHA (_UPPER | _LOWER)

–insert end

3: No ecvt functions in QNX (well, ecvt is in stdlib.h, but not in any lib I could find), so use sprintf instead. This will mean that output of floating point numbers won’t be particularly fast:

File /src/num_put_float.cpp

line 46

–# include <values.h>
++# if defined (QNX)
++# define USE_SPRINTF_INSTEAD
++# else
++# include <values.h>
++# endif
–end–

new line 56
–# if !defined(__STL_USE_GLIBC) && !defined(FreeBSD) && !defined (_AIX)
++# if !defined(__STL_USE_GLIBC) && !defined(FreeBSD) && !defined (_AIX) && !defined(QNX)
–end–


I hope that’s everything. Now you just change to the src directory, and type:

make -fgcc.mak clean all

Building takes about 10 minutes on my Athlon 800.

make install

copies the relevant files to /tmp. You should probably move them to usr/local/include/stlport and usr/local/lib. Then, for debug,

g++ -g -D__STL_DEBUG -I/usr/local/include/stlport -L/usr/local/lib -lstlport_gcc_stldebug yourfile.cpp

and, for optimized builds:

g++ -O2 -I/usr/local/include/stlport -L/usr/local/lib -lstlport_gcc yourfile.cpp

If you find any bugs or have trouble with the build, let me know and I’ll see what I can do.

I obviously accept no responsibility whatsoever for any problems you may have or damage that may occur to your system. You use the patches at your own risk.

Tom
(tom.wibble.widmer@cenes.co.uk remove the .wibble)