Pr1 said he deliberately caused this error by commenting out the MsgReply() in his resource manager. Presumably to test what would happen in his own code if the Manager ever crashed for some reason (maybe a need to inform the user).
So his question is really how do you detect and handle a problem if a resource manager goes away unexpectedly if you can’t return from devctl()?
This sounds to me like he doesn’t know what he’s doing.
If the resource manager died, the send (or devct) would return immediatly with an error.
If the resource manager has a bug and doesn’t return, there’s no reason (to my way of thinking) to force the application to continue.
If one really needed to do this, there are possible solutins.
Example: a 2nd thread with a timer that wakes up and sets a signal on the resource manager.
The resource manager could be killed, but better it can catch the signal and receive an unblock pulse. I think this might be tricky at the devctl() level. I know how to do it if the threads are using send/receive/reply directly.
Step1:Process 1: Transmits the command OCT_CMD and data using the devctl() to the resource manager.
Step2:Resource Manger: The manager receives the command and send the message to the process 2
Step3:Process 2: Process 2 is performing some heavy tasks with respect to the hardware, which is taking long time. Once the task is completed, then it replies back with the results to the resource manager.
Step4:Resource Manger: The manager receives the reply from process 2 and reply back to the Process 1 and sends the results to the some other process.
If the process-2 is stuck at step3 or took very long time, then the process-1 will be in blocked state. Here resource manager is alive, but the process-2 is taking long time. Here we want to unblock the devctl() if is taking very long time.
I know this is not the good design. But this is how code was written and maintained.
According to the docs, TimerTimeout() should be able to cause an unblock. You have to tell it what to unblock, which is a send or reply.
_NTO_TIMEOUT_SEND or _NTO_TIMEOUT_REPLY
devctl() just does a MsgSend to the resource manager.
This is a very bad way to do business.
If it is reasonable for the resource manager to be blocked waiting for another process to finish work, it should be reasonable for the client to also wait.
If someone architected this wrong and you can’t fix it, you just have to cross your fingers and hope this kludge works. You could also give the client a thread that wakes up after a timeout and kills the resource manager. This too is a terrible way to do things. The third process might also need to be killed.
When writing a resource manager there is a potential problem relating to a client that is reply unblocking due to a signal being set on it… If the process has a signal set on it and becomes unblocked, the resource manager would not know this and might never clean up any resources associated with the client. This could be repeated until the system has run out of resources. So there needs to be a mechanism to prevent this.
The mechanism exists for a resource manager to be informed of signals set on a client. This occurs by the receipt of a message. This gives the resource manager the opportunity to clean up the clients resources and then reply letting the client deal with the signal. The resource manager also has the option of not replying.
Since TimerTimeout seems to be implemented by having a signal set on the client, it might be that your resource manager is not cleaning up and letting the client unblock. If that is the case, you are back to needing to fix the resource manager.
Note this: MsgSend()* doesn’t unblock on SIGEV_UNBLOCK if the server has already received the message via MsgReceive()* and has specified _NTO_CHF_UNBLOCK in the flags argument to its ChannelCreate() call. In this case, it’s up to the server to do a MsgReply()* or MsgError().
If the resource manager has set the NTO_CHF_UNBLOCK flag then what you are doing can’t work.