Performance of mqueue server

I have an application which uses mqueue server for interthread and interprocess communication.
The data transfer rate is quite high .On an average 200 messages/sec approx are getting transfered.
Have observed that during this time system performance is quite poor.
On doing top, seen mqueue and kernel hogging most of the CPU time taking up almost 30% of CPU time.
Is there a way the system performance can be improved?
Tried using mq service.In this case , though mq isnt there in picture but kernel itself takes up 30 to 40% of CPU time with same data rate.
Note: My system on which this application runs is a low end system with 300 Mhz CPU and no CPU cache.

Need inputs as to why the CPU times are so high and way to get around it???

No cache, ouch, what kind of processor is that, what speed is the memory?

How big are the messages?

For inter-thread, use global memory and pass pointers, use mutex/condvar to sync.
For inter-process, switch to direct MsgPassing / read-write, or use a Shated Memory object.

@Mario, ya its a low end system…got to live with it…
Well, my messages size is 28 bytes / message.

@Thunderblade,well your approach will call for total redesign of my application.
Also I think using MsgPassing services will also result in kernel taking CPU.
Is there any other way or setting that can be done at system level in mqueue or my usage of mqueue which result in performance improvement?

mqueue is based on message passing (so you actually are double the kernel’s involvement has the data as to go to mqueue and from mqueu). You could use mq which is a using asynchronous messaging with will reduce overhead but won’t work over the network ( I may get mqueue and mq backward). Pretty much everything in QNX is messaged based, accessing a file, a printf(), etc.

If your message are made of 28 bytes then it’s definitely not worth going through message queue.

Yes Mario you are right everything in QNX is message based.

I have already tried “mq” server as well, in this case mq isn’t come in picture (top) but kernel itself starts taking CPU up to 40%. This is similar as mqueue + kernel = kernel (mq).

In my application messages are communicated between two different threads. First one is writing in the queue, the other one starts reading from queue to process that message accordingly. Mutex is used while writing to mqueue for synchronization.

Is there any another efficient way of inter thread communication, where communication is to be done in fastest manner (real time response)?


Thunderblade already outlined all the other potential ways to pass data between threads. The shared memory option is probably the fastest since data doesn’t have to be copied in a message. Regardless of the option used, the fact you lack a cache on your CPU is going to always hurt the performance since every instruction is going to have to be fetched from memory.

Out of curiosity, what’s the problem with taking 30-40% of the CPU? You still have 60+% spare. Is there some timing requirement you aren’t mentioning that you aren’t meeting?

Depending on the ‘real time’ requirements of your application you might try passing messages in bulk on defined intervals (say every 10 ms, pass whatever has queued up as 1 large transfer containing multiple messages) as that will have fewer calls to mqueue (with you needing to bundle/unbundle the bulk transfer).


Threads of the same process ? Well then you don’t need to go through anything special. If you use C++ just use a std::list or std::deque (and use mutex to protect it). Or if you are using C just write your own stuff, that’s very basic. If you are carefull you can even deal with this without using mutex.

Just a meta comment here. Why would someone ask a question like this? “When I drive my car up a 45 degree incline, why doesn’t it go as fast as on a flat surface?”.

Pretty obvious (to me) that the problem is not message passing, mqueue or mq, but how much data is being moved and how much performance the processor has to give.

I like Mario’s suggestion to get rid of the IPC completely if possible. Use shared memory between processes or just use process memory if between threads.