mutex lock causing core dump???

ajikumarmp · September 30, 2008, 8:45am

Hi,

I am trying to debug an application code that gets core dumped. GDB output shows dump at mutex lock() function. But the same mutex lock was working many times before this core happned.

What could cause the mutex lock not to dump intially and then cause it to dump after sometime. I am doubting memory corruption.

Can somebody tell me a better way of debugging this problem, since currently i am stuck with finding what is causing the coredump.

regards
Aji.

Tim · September 30, 2008, 1:44pm

Aji,

Why are you doubting memory corruption? This sounds almost exactly like memory corruption when a mutex that once worked later crashes.

What is the error you see on the crash? Is it a segment violation that most likely indicates a memory corruption?

How many threads are there in your application (I assume it’s multi-threaded since you are using mutex’s)? When you look at the core dump in gdb, make sure you do a ‘thread’ command to see the state of all threads. The currently running one has a * next to it. Sometimes I have found that gdb starts the core file in the wrong thread.

Assuming that the crash is in the running thread with the mutex.lock() call I’d go and look at the code/memory region where the mutex is created and see if it is truly corrupted (hex dump that memory region sometimes helps). It is especially more likely if the mutex is created on the heap instead of the stack in another thread (for example if another thread created the mutex on the heap it may have released it either on purpose or perhaps inadvertantly if that thread was itself exiting for some reason).

Also, assuming this crash is easy to reproduce, you might just run your application in the debugger to start with instead of trying to debug the core file. Then you can specifically have gdb watch the memory location of the mutex to see if its gets released/overwritten. This will make your app run REALLY slow but if you are desperate and patient it will work.

Tim

kwschumm · September 30, 2008, 1:57pm

Is this code executing on an ARM processor?
Is the mutex in shared memory and mapped SHMCTL_PHYS|SHMCTL_GLOBAL?
Is the mutex shared by separate processes?

mario · September 30, 2008, 2:40pm

Put two variables around the mutex then run the application with the debugger and put a breakpoint on write access to these variables. If the mutex gets corrupted it’s very likely these one of these variable will get corrupted as well.

Or you might just put some if/printf statement on these variables to try to zoom in on when they get modified.