pidin - mutex blocked threads when hypethreading

Hi

When i run my application with hypethreading,

at random, the 3 threads get locked on the same mutex.

How can i get more information on the mutex?

Regards
Sheran

This probably has nothing to do with hyper threading other than the fact that a hyper threaded cpu has 2 virtual cpu’s. This introduces some new exciting ways to create race conditions. For example It now becomes possible that two cpu’s are running the same code block at the same time.

It would probably be good to post the relevant parts of your code.

434214 1 proc1 134r MUTEX (0xb037c0a0) 434214-01 #0
434214 2 proc1 255r RECEIVE 2
434214 3 proc1 10r RECEIVE 5
434214 4 proc1 10r RECEIVE 0
434214 5 proc1 134r MUTEX (0xb037c0a0) 434214-01 #0
434214 6 proc1 10r MUTEX (0xb037c0a0) 434214-01 #0
434214 7 proc1 10r MUTEX (0xb037c0a0) 434214-01 #0
434214 8 proc1 10r RECEIVE 17

from “pidin mem”
libc.so.3 @b0300000 488K 16K

This of course is not code. It shows 4 threads mutex blocked (waiting for the mutex). Four threads are waiting to receive a message. It is most likely that the mutex is owned by one of the threads in RECEIVE mode. Without any idea what you are doing, there is no reason to think that anything is wrong.

What do you mean by 2 cpus running the same code block? How can it cause the software hang?

Do you know what a race condition is?

Some problems in a computer are caused by the order in which things occur. With only one cpu, only one thing can happen at a time.
With two cpus two things are happening at the same time. If a process has two threads, both could be active at the same time. So it is possible for both threads to be executing the exact same code at the same time. If you haven’t anticipated this when coding, unexpected things can happen.

If there is a shared block of data between threads, it has to be synchronized.

What if I don’t have shared data, but same block of code executed by different threads? Does it need any special handling?

The mutexes which my threads are blocked on, belong to the data segment of shared object libc.so

This doesn’t need any special handling but you do have to be careful. If the routine only works on local (stack) variables and then returns a result, there can be no problem. Otherwise synchronization might be necessary.

I rather doubt there is bad code in libc.so that could cause a deadly embrace, which is what you seem to be implying is happening.

My suspicion that you have a race condition is based on you mentioning that this happens randomly. You might also want to check the documentation for any calls you make that are not marked thread safe.

Create a map file as part of your build process. This will show exactly what routine the mutex is in.

It would also help to know what the Mutex call in your code looks like (what parameters during creation time, acquisition time etc).

Tim

My application had memory leaks and was running out of RAM memory, causing different behaviors. I dont see the issue after fixing it.

I highly recommend using Cppcheck (cppcheck.sourceforge.net/). Most thread unsafe used functions are reported. Also, some bad memory usage are reported too.