Good day everyone. I’m writing a resource manager and I’ve faced the problem that if I pass a large buffer (about 1MB) to MsgReply from io_read handler then my resource manager starts eating processor time enormously, for example:
int io_read (resmgr_context_t *ctp, io_read_t *msg, RESMGR_OCB_T *ocb)
if ((status = iofunc_read_verify (ctp, msg, &ocb->ocb, NULL)) != EOK)
if ((msg->i.xtype & _IO_XTYPE_MASK) != _IO_XTYPE_NONE)
MsgReply(ctp->rcvid, msg->i.nbytes, my_buffer, msg->i.nbytes);
That’s the io_read handling code from my test resource manager. my_buffer is 786432 bytes big. For testing purposes I wrote a program which issues ‘read’ to my resource manager once in 300ms and reads 786432 bytes every time. In that case, my resource manager eats ~20% of processor, if I issue ‘read’ once in 100ms, the processor load is ~40%. Just terrible. The profiler shows that all 20%(40%) are used by single function - MsgReply and there’s no other code executing in resource manager (well, we still have one io_open and one io_close_ocb). So the question is: why so much ?
I’ve used my test program to test filesystem manager, I’ve read a large file by issueing 786432 byte 'read’s once in 300ms and cpu load was ~1%. So, why do I have ~20% ?
Nah no need for shared memory, you are looking at 2.1Mbytes/sec this should use very little cpu.
Try return _RESMGR_NOREPLY instead of EOK. By default the framework does the reply for you ( might want to consider doing that instead), but because you didnt tell it that you already did the reply it is most probably doing a second MsgReply which cause havoc. Im pretty sure the CPU usage you noticed with the profiler is not your MsgReply but the MsgReply perform by the resmgr framework.
It is true that using MsgReply() and after return(EOK), will cause MsgReply to be issued twice- second time by the RM framework. But do not forget MsgReply is not blocking kernel call, and kernel will immediatelly return with ESRCH as the sender is no longer REPLY-blocked!
Anyway I continue to believe that for data chunks as 786432 shared memory would provide better bandwith! Message passing is excellent for tiny messages. Actually you might find this recommendation in QNX documentation as well(Nto_sys_arch.pdf)!
I tried using _RESMGR_NOREPLY insted of EOK, no use. koko, ok, I can use shared mem or something like that. But I just don’t understand why does dev-ide handle such messages with cpu load ~1% (this also includes lot’s of operations with hardware), but my resource manager which only calls MsgReply, uses 20%.
By the way, I’ve mesured the time MsgReply is executing, it’s 30ms!!! You said that it’s a non-blocking call, how come it executes for 30ms then ?
Just tried sending a 750K message 20 times a second and it takes less then 1% of CPU (in fact with hogs it shows 0%)
I find that it’s more about the number of messages per seconds then the size of the message because the cost of messaging is more about the context switches then the actual moving of data. In fact if you need to implement sort or go/status mecanisms with pulses, you’ll find that it’s pretty close to s/r/r in term of overhead.
Yes you get less CPU usage or more bandwidth with shared memory, but you get increase complexity and you loose networking ability.
Sheff, blocking operations are MsgSend( you try to send but the receiver is busy with something else ) and MsgReceive( you are ready to receive but nobody send to you). With MsgReply the case is different. The thread/process is waiting on your reply, being Reply blocked. Thereafter your MsgReply is non blocking, and the measured time of 30ms is the one needed to transfer the data to the sender buffer + context switching. Try to use smaller amount - 700bytes and check what is going on.
koko, in your test you’ve replaced 700KB with 700b, it’s not quite correct, better replace 700KB with 1000 reads of 700b, then pause, etc…
I tried to do so, that’s even worse.
Also I’ve written another test program:
The problem is solved.
Such high cpu load was present because my_buffer was allocated using mmap with NO_CACHE flag. I did it so because my_buffer is actually video frame buffer and I read in QNX docs that for video frame buffers it’s better to set NO_CACHE. But I just got it all wrong by the time I read it, you should only set NO_CACHE if you only need to access dual ported memory
What do you mean by a “video frame buffer”? Is it just a buffer in memory that you transfer video frames in and out of, or is the memory connected to a piece of hardware? If it is connected to hardware, how can it not be dual ported?
maschoen, it’s a DMA buffer, video capture card writes data to this buffer and I read it. You’re right, it is dual-ported, but anyway, since I have to copy that memory I must cache it in order to maximize performance…
I think we have some terminology issues here. If it’s a DMA buffer, that usually means it’s memory you allocated, and pointed a DMA channel at.
This is not Dual ported, and as all memory, can be cached.
On first thought, this doesn’t make sense either. Caching should only help performance if you have to read the memory a 2nd time, as happens with code, and local variables. However, I suspect marking the data as uncachable means that the Level 2 and 3 caches are turned off. Data is retrieved from normal memory, only when requested. With the cache turned on, data can be moved into the cache in 64bye or 128byte chunks.
I’m just guessing here, but I think that must be what’s happening.