MsgSend() and TimerTimeout() issue

It seems that despite setting kernel timeout for MsgSend() it blocks for few seconds on state reply when network cable is unplugged.

Here’s the sample code:
timeout = 500 * 1000000;
TimerTimeout(CLOCK_REALTIME, _NTO_TIMEOUT_SEND | _NTO_TIMEOUT_REPLY, NULL, &timeout, NULL);
rc = MsgSend(fd, &msg, sizeof(msg), NULL, 0);

Yes. When it comes to network sends; it is best to think of the granularity of the system timer being that of the timeouts you have set for Qnet.

Consider that (by telling the OS that it is “normal” for a reply to take 2 seconds, by way of the command line args to Qnet) that the kernel will not start your “TimerTimeout” clock for reply until the “normal” time has elapsed for a reply to occur (in essence it can’t consider that you are reply blocked, until you have actually entered the reply state, and it can’t be sure that you have entered the reply state, or not until the worst case time for this transaction to have occurred over the net has elapsed).

The bottom line is (in order to get finer granularity on timeouts (which is what I think you want), you need to modify the timeouts to qnet. Once this is done, however, you’ll find that you don’t need the TimerTimeout at all, since if the remote node fails to respond within the defined qnet timeouts, your code will be unblocked with a EHOSTDOWN anyway.

Another way to look at it, is that using TimerTimeout is not the appropriate solution for dealing with network failures (TimerTimeout, is more about protecting against defective code in the local case).

and how do I modify qnet timeout?

use /lib/dll/npm-qnet.so

And look at “periodic_ticks”, “tx_retries” and probably “tx_ticks”.