Backtrace library dynamically loaded with dlopen causes errors

Hi,

I’m trying to backtrace a function call in my software, libbacktrace seems to be the qnx tool to do that. It’s not possible to simply link backtrace with -lbacktrace as for some reason it does not get found. I’m not sure what the issue is here, I’ve decided to move forward and link libbacktrace dynamically with dlopen.

The library does get found and open by dlopen, the function pointers get assigned as well, however when trying to call them bt_init_accessor_ptr(&acc, BT_SELF) an error Invalid argumnts is thrown. I’m really not sure whats going on. All the code is as below:

#include <backtrace.h>

typedef int (*bt_init_accessor_t)(bt_accessor_t *acc, bt_acc_type_t type);
typedef int (*bt_get_backtrace_t)(bt_accessor_t *acc, bt_addr_t *addrs, int len);

void report_vm_error(const char* file, int line, const char* error_msg, const char* detail_fmt, ...)
{
    void* backtrace_handle = dlopen("/data/home/root/jdk/lib/libbacktrace.so", RTLD_NOW | RTLD_GLOBAL);
    do 
    {
        if (backtrace_handle == NULL)
        {
            fprintf(stderr, "report_vm_error::dlopen error: %s\n", dlerror());
            break;
        }

        fprintf(stdout, "report_vm_error::dlopen successful!\n");
        dlerror(); // clear errorrs

        bt_init_accessor_t bt_init_accessor_ptr = (bt_init_accessor_t)dlsym(backtrace_handle, "bt_init_accessor");
        char *err = dlerror();
        if (err) {
            printf("Could not resolve symbol bt_init_accessor: %s\n", err);
            break;
        }

        dlerror(); // clear errorrs
        bt_get_backtrace_t bt_get_backtrace_ptr = (bt_get_backtrace_t)dlsym(backtrace_handle, "bt_get_backtrace");
        err = dlerror();
        if (err) {
            printf("Could not resolve symbol bt_get_backtrace: %s\n", err);
            break;
        }

        bt_accessor_t acc;
        if (bt_init_accessor_ptr(&acc, BT_SELF) == -1)
        {
            fprintf( stderr, "%s:%i %s (%i)%s\n", __FUNCTION__, __LINE__,
                "bt_init_accessor", errno, os::strerror(errno));
            break;
        }

        break;
    }
    while (true);
}

bt_init_accessor is a function with a variable number of parameters, however only the two first are required when specyfing the flag BT_SELF. The target system is aarch64le. The function output is as below:

report_vm_error::dlopen successful!
report_vm_error:216 bt_init_accessor (22)Invalid argument

Thanks for all the help!

I’m probably not going to be much help here. I have no experience with this strategy and I don’t know why you are doing it. I’m guessing that you have a program crashing in a shared library and you want to know why, or maybe where in the shared library the problem occurs. If this is not the case, you can skip what I have to say.

There are two possibilities that I can think of that you are dealing with. Either the shared library belongs to QNX or someone else, or it belongs to you. In the latter case, you can always try compiling the library in statically. In the former case, something I’ve run into on occasion, your best bet is to assume you are passing bad data to the shared library routine, and to take a close look at that data and what the routine is asking for.

Thanks for your help @maschoen, I looked at the data close enough, can’t find anything suspicious. You’re right about me wanting to find out where the crash comes from. I figured backtracing would be the easiest, yet here I am struggling with those issues.

  1. Compile your code in debug mode
  2. Run the QNX ‘dumper’ process
  3. Wait for your code to crash and you’ll get a dump file.
  4. Use gcc to load the dump file and you can find out exactly where it crashed.

If you only crash in release mode you can still get a dump file but it wont’ contain near the level of detail (I think you just get the function rather than down to the specific line and you can’t inspect runtime variable etc).

Tim