Multicore system hang

Hi all,

       I have an application which runs smoothly on a single core system. But when i run on a multicore system(4 core), application hangs. 

If i lock all the threads to a single CPU, application works fine

IPCs used include (MsgSend(), mq_send, mutex, semphore). I have locked some threads to run only on CPU 1. Rest threads run on any available CPU.
There is no GUI updations in the application.

1)Could anyone please suggest the possible issues for the system hang ?
2)What all should i check to sort out this issue?
3)Is it safe to use the IPCs mentioned above in a multicore system ?

Please help :cry:

Regards,
hello

Semaphores (ie the sem_() family of calls) don’t respect priority inheritance like Mutex’s do. I personally don’t use them for that reason. Not saying it’s causing your deadlock but it certainly could be.

Certainly Msg_Send() and Mutex’s work on Multi-core (I used them both on our multi-core machine). I haven’t used mq_send() in years so I don’t know about that (I suspect the mq code really uses Msg_Send()).

Tim

The most likely culprit is a race condition. A race condition is a bug that depends both on the code and on timing. They are the worst type of problem because you could test for hours and not see one, but after you deploy, they always appear.

Race conditions can also happen on single core systems, however they can be prevented by running all the competing threads with the FIFO scheduling. It’s also sometimes possible to prevent race conditions with adjustment of priorities. This requires careful verification of the code. Some specific race conditions can be prevented by careful coding but this requires an understanding of the compiler generated code.

These three ways of dealing with race conditions have the advantage that they incur no additional overhead. This is tempting but I personally discourage it.

mutexes, semaphores, read-write locks, and condition variables are the usual ways to prevent race conditions. If you are not running in a thread context where you can make a system call, such as in an interrupt handler, sometimes spin locks will solve the problem.

The basic idea behind a race condition is that you have two pieces of code that are dependent on the same data. If your code does not protect against both pieces of code using the data at the same time, then you can get a race condition. In a system with a single processor, one thread could be in the middle of an update when re-scheduling occurs, allowing the the 2nd thread access. FIFO scheduling prevents this. Differing priorities can prevent one thread from being interrupted, but not the other. This is sometimes sufficient. In a system with more than one processor, none of this works. Two threads at different priorities can be in the same code at once.

Programmers new to these ideas are often surprised when an extremely unlikely event occurs. I was. Unfortunately Murphy’s law is a fact here.

When the applications locks up, connect the debugger to it, that should point you to the part of the code causing you grief.

That is a very generic statement. Using pidin, or the System Information Perspective in the IDE, you should check the states of your threads. Are they receive-blocked? Waiting on a mutex? Or all READY? This will give you an idea about what’s going on.

Regards,

  • ThunderBlade