Is there any way to unblock devctl()?

Pr1 · February 14, 2023, 12:10am

Hello All,

Greetings of the day!!!

example:

#define	 _OCT_CMD		 0xF005          // devctl() macro oct_rm 
#define	  OCT_CMD			__DIOTF(_DCMD_MISC, _OCT_CMD, oct_req)

pthread_detach( pthread_self() );

ipc_err = devctl( FileDescriptor, OCT_CMD, oct_req, sizeof( oct_req ), NULL);

printf("Oct CMD Reply:%d\n", oct_req->oct_reply.error);

// OCT Error validation
if ((EOK != ipc_err) || (oct_req->oct_reply.error) && (oct_req->oct_reply.error != OCT_QUIT)) {

        centration_reply(oct_req->oct_reply.error);
}

pthread_exit( NULL );
return( NULL );

In the above code, the control never comes to the printf after devctl, if the reply did not come.
Please suggest how to unblock the devctl().

Thank you.

nico04 · February 14, 2023, 7:57am

Hello,

Have you tried to add a TimerTimeout() call just before devctl() call ?

Nicolas

Pr1 · February 14, 2023, 4:39pm

Hello,

Thank you for the reply.

I have tried with TimerTimeout() call just before devctl() call, but it is not unblocking the devctl() after timeout.

Just FYI, for testing purpose, I was commented the MsgReply() to timeout the devctl() call.

As per https://www.qnx.com/developers/docs/6.3.2/neutrino/lib_ref/d/devctl.html, devctl() did not have an error like TimeOut and no where mentioned about devctl blocking.

Thank you!!!

maschoen · February 15, 2023, 5:47am

I don’t think you should worry about unblocking the devctl() call. I think you should worry about why the resource manager you are calling is not replying.

I don’t know what resource manager you are calling nor what OCT_CMD does. Is it home grown? If so, it should be within your power to find out why it doesn’t return.

Forcing a send or reply blocked thread to unblock is rarely a good thing.

Tim · February 15, 2023, 2:38pm

Maschoen,

Pr1 said he deliberately caused this error by commenting out the MsgReply() in his resource manager. Presumably to test what would happen in his own code if the Manager ever crashed for some reason (maybe a need to inform the user).

So his question is really how do you detect and handle a problem if a resource manager goes away unexpectedly if you can’t return from devctl()?

Tim

maschoen · February 15, 2023, 5:18pm

I missed that part, removing the reply.

This sounds to me like he doesn’t know what he’s doing.
If the resource manager died, the send (or devct) would return immediatly with an error.
If the resource manager has a bug and doesn’t return, there’s no reason (to my way of thinking) to force the application to continue.

If one really needed to do this, there are possible solutins.
Example: a 2nd thread with a timer that wakes up and sets a signal on the resource manager.
The resource manager could be killed, but better it can catch the signal and receive an unblock pulse. I think this might be tricky at the devctl() level. I know how to do it if the threads are using send/receive/reply directly.

Pr1 · February 15, 2023, 6:42pm

Hello Tim and Maschoen,

Thank you for the responses.

Currently, the sequence is like below:

Step1: Process 1: Transmits the command OCT_CMD and data using the devctl() to the resource manager.

Step2: Resource Manger: The manager receives the command and send the message to the process 2

Step3: Process 2: Process 2 is performing some heavy tasks with respect to the hardware, which is taking long time. Once the task is completed, then it replies back with the results to the resource manager.

Step4: Resource Manger: The manager receives the reply from process 2 and reply back to the Process 1 and sends the results to the some other process.

If the process-2 is stuck at step3 or took very long time, then the process-1 will be in blocked state. Here resource manager is alive, but the process-2 is taking long time. Here we want to unblock the devctl() if is taking very long time.

I know this is not the good design. But this is how code was written and maintained.

Thank you again.

nico04 · February 20, 2023, 8:18am

No way to modify the resource manager ?

maschoen · March 2, 2023, 10:53pm

According to the docs, TimerTimeout() should be able to cause an unblock. You have to tell it what to unblock, which is a send or reply.

_NTO_TIMEOUT_SEND or _NTO_TIMEOUT_REPLY

devctl() just does a MsgSend to the resource manager.

This is a very bad way to do business.

If it is reasonable for the resource manager to be blocked waiting for another process to finish work, it should be reasonable for the client to also wait.

If someone architected this wrong and you can’t fix it, you just have to cross your fingers and hope this kludge works. You could also give the client a thread that wakes up after a timeout and kills the resource manager. This too is a terrible way to do things. The third process might also need to be killed.

Pr1 · March 9, 2023, 6:45am

Hello Maschoen,

Thank you for the response.

I did tried using the TimerTimeout() with _NTO_TIMEOUT_SEND or _NTO_TIMEOUT_REPLY just before the devctl(). But it did not unblock the devctl().

qevent.sigev_notify = SIGEV_UNBLOCK;
ipc_timeout = 1000000000L; // 1 second = 1 billion nanoseconds
TimerTimeout(CLOCK_REALTIME, _NTO_TIMEOUT_SEND | _NTO_TIMEOUT_REPLY, &qevent, &ipc_timeout, NULL);

The article https://www.qnx.com/developers/docs/6.3.2/neutrino/lib_ref/d/devctl.html did not mention about the blocked state and did not mention about the “ETIMEDOUT- A kernel timeout unblocked the call.” like these are mentioned in MsgSend().

And yes, devctl uses the MsgSend to send data to the resource manager.

Thanks

maschoen · March 9, 2023, 1:13pm

I think I know why this is not working.

When writing a resource manager there is a potential problem relating to a client that is reply unblocking due to a signal being set on it… If the process has a signal set on it and becomes unblocked, the resource manager would not know this and might never clean up any resources associated with the client. This could be repeated until the system has run out of resources. So there needs to be a mechanism to prevent this.

The mechanism exists for a resource manager to be informed of signals set on a client. This occurs by the receipt of a message. This gives the resource manager the opportunity to clean up the clients resources and then reply letting the client deal with the signal. The resource manager also has the option of not replying.

Since TimerTimeout seems to be implemented by having a signal set on the client, it might be that your resource manager is not cleaning up and letting the client unblock. If that is the case, you are back to needing to fix the resource manager.

Tim · March 9, 2023, 5:55pm

Do you have the code for the resource manager?

From the description of TimerTimeout:

Note this:
MsgSend()* doesn’t unblock on SIGEV_UNBLOCK if the server has already received the message via MsgReceive()* and has specified _NTO_CHF_UNBLOCK in the flags argument to its ChannelCreate() call. In this case, it’s up to the server to do a MsgReply()* or MsgError().

If the resource manager has set the NTO_CHF_UNBLOCK flag then what you are doing can’t work.

Tim

Pr1 · March 9, 2023, 9:35pm

Hello Tim,

Thank you for the response.

The ChannelCreate() function was called with only _NTO_CHF_FIXED_PRIORITY flag and in some cases with no flag. Did not used NTO_CHF_UNBLOCK anywhere in the whole code.

msgChannelId = ChannelCreate(_NTO_CHF_FIXED_PRIORITY);

channelId = ChannelCreate(0);

Thank you again.