We have a fairly complex system which, when building requires that we create shared libraries (.so) from two or more source or .o files (we’ve tried this many ways). Building a .a archive to work from doesn’t work either.
In all cases of multiple files, symbols are lost in the .so files. We have problems with QCC and with g++ so it is a pretty deep rooted problem.
This leaves us with missing symbols which are plainly in the .o files and which disappear from the .so files. We also see, in some build variations, the .o files don’t even get included in the .so, they are “NEEDED” as reported by the objdump.
We are really puzzled here. Anyone with any suggestions?
My bad, the -o is there too. Matter of fact, if you can think of a compile option we’ve probably tried it already, but I am hoping to see all the possible permutations suggested in case we missed one.
We just tried it on a 6.2.1 system with 2.95.3 compiler and it produced all the symbols but the dlopen() call segfaults (as of 15 minutes ago, we are checking). This leaves us with an error in the runtime libs for 2.95.3 and a linker problem in 6.3 - 3.3.1 or 2.95.3 compilers. What is wrong is rather nasty for us.
Hmm… seg faulting on dlopen is usually caused by failing to compile one (or more) of the objects with the -shared flag. Of course if you are having problems with static libs, the shared issues may just be a red herring.
Is there anything “interesting” about these objects? Are they just plain C/C++? Are they extremely large or anything? I mean obviously most of the rest of us aren’t having these problems, so the trick is to see what you are doing differently.
We’re going over the code, but at least one of the modules that dies in dlopen is explicitly built -shared. The odd bit for us is that the same code generates ALL the symbols in the .so in 2.95.3 under 6.2.1 and only a fraction of the symbols wind up in the .so in 3.3.1 or 2.95.3 on 6.3. Same code. The dlopen works on 6.3 but there’s no symbol to work with, it fails on 6.2.1 when the symbols are there.
Code is pretty straightforward modem control stuff. Works nicely under linux not particularly huge individually, overall pretty big as a system.
Do you have any inline asm in your code? The only time I had a similiar problem was when I was porting code from linux and the inline asm that worked there, would not work in QNX - mainly because it wasn’t relocatible and QNX’s linker was not able to deal with it (I don’t recall the exact details).
If not, the next approach is to take a simple example and build it in your frame work - and then slowly add more to the simple case until you start seeing the problem again - basic divide and conquer.
It is a PIA but it might give you a single module causing the problem - which can then help identify what the problem is - be it a bug in your code, or a “feature” that QNX has.
By losing symbols, do you mean that you get a bunch of “unknown symbol” messages when trying to load the shared object?
If so, you have been bitten by a 6.3.0 linker bug.
BTW - the segfault on dlopen under 6.2.1 is almost certainly a non-shared object being linked in - check your map file, and check (with objdump -h) for the presence of rel.text or rel.rodata - these are sure signs.
Under 6.3.0 this wouldn’t fail, since it will detect the code relocations and just make your ‘shared’ object a private copy.
There is a 6.3.0 linker bug? Is this something there is a patch/workaround for?
We don’t have inline asm, but we did inline a bunch of low-level C. I will discuss this with
the team. I suspect we can leave it without the inline flag… but if the compiler should optimize it
to become an inline, would it be the same?
I will push my guy in OZ to check for the qnx “official word” on bugs in linkers.
Yes, A raft of unknown symbol complaints that have no identifier for the symbol. Just the error message without any clues as to what causes them. Not what I’d expect from the error messages. Makes me wonder though, if the two things are related. If I have a non-shared object inadvertently included would the operation to make the private copies leave a dangling error flag/pointer in the linker?
Yes, 3.3.1 as well as 2.95.3 -
hmmmm… a separate 6.2.1 linker? Could I use it under 6.3 ?
At present we are critical time (ain’t we always) and working around using static links for this high-level driver code. We haven’t been able to reproduce it with simple examples yet, possibly because we were using C rather than C++ for the simple examples and the problem is in C++. It was the linker (so we thought), which is why we tried the C for a first cut. To make it do this spaghetti link trick we’ll probably have to build some C++ code with similar dependencies and inheritance. Longer and harder and so it is delayed.
I am facing exactly this problem, I believe. I am trying to build and test a complex, big shared library on QNX 6.3.0. Here are the details:
Host: Self-hosted QNX Neutrino 6.3.0
Building Adaptive Communications Environment middleware library (its open source, ported to several platforms: cs.wustl.edu/~schmidt/ACE.html)
After the usual initial struggle, I could build the library, libACE.so on the QNX host. I could also compile and build the executables corresponding to ACE tests that use libACE.so.
So far, so good. But the moment I try to execute one of these ELF execuable ACE tests, I get a huge pile of messages “unknown symbol:” and lastly “Could not resolve all symbols” and the test terminates without running at all.
I think this is what is going on:
The executable is linked against a shared library.
$ ldd shows all libraries being reached by ldd including libACE
When attempting to execute, the process manager, looking at the ELF header, figures out that this executable is linked with shared objects and it tells the runtime linker to load the shared libraries in the process address space.
The run-time linker some how cannot resolve the symbols in libACE.so:
=>I tried all options - giving -rpath, -Bsymbolic, -E etc options to the linker - both for building the library as well as the application; but does not work at all - same error message as above is spit out.
=> the issue could be either locating the shared lib and/or resolving the symbols within the shared lib. I suspect it is the latter as I linked libACE.so at various paths (pointed to by LD_LIBRARY_PATH in the build file of procnto as well as my typical environment variable LD_LIBRARY_PATH. The latter contains path to libACE.so)
A couple of interesting tests / observations:
<> Building the libACE.so, tests and running the tests that use libACE all work seamlessly on Linux.
<> On QNX 6.3.0, *IF I BUILD LlibACE, and the tests as STATICALY LINKED objects * - means ACE is build as libABC.a - building the libACE(.a), the tests and running the tests WORKS. Confirms my observation that there is an issue with loading dynamic object(s) and executing an ELF executable that is linked dynamically against those objects
<> I wrote a trivial program, a hello world type and created an executable linking it dynamically with libACE.so. I DID NOT CALL ANYTHING FROM libACE, and yet, trying to run that executable (linked with libACE.so) gave exactly the same problem.
Is something wrong with the library (libACE.so) ?
Bug with the run-time linker ?
I am foxed by this problem - went thru all QNX 6.3.0 docs, did not help …
Has anyone faced such issue - any hint / help is greatly appreciated …
So far no joy. I am in a reasonable position to work around using the static link stuff until we get a 6.3.1 variant which fixes this. I am not optimistic of it happening in the short term. If you have that trivial routine that elicits the trouble my contract lets me dumpt it “up” to QNX Oz and thereon into the corporate bugstream.
I think there’s a few qnx employees about here too. e-mail me -
bchippindale*@*networkadvantage.*biz - but without the asterisks.
I am pretty keen to get this fixed, but too busy to focus on it if I have a workaround ( which I do ).
Yes. Upgrade to 6.3.0SP2. I build tons of C++ shared libraries, and I thoroughly encountered the problem described in this thread, but with SP2 everything is fine (since you have 3.3.5 I think that means you have SP2), so more than likely the problem is yours.
Do you have a simple example where you can’t run an exec built with a shared library? (i.e. fooapi.cc, foo.cc, build libfooapi.so link foo.o against libfooapi.so, set the LD_LIBRARY_PATH run foo, and get unknown symbols).