Can't slay the client of a resource manager

Shaun_Jackman · August 9, 2002, 12:23am

I have a resource manager (it actually lives within io-net) that registered
a custom message type using message_attach(). The client does
coid = open( “/dev/io-net/vn0”, O_RDWR);
MsgSend( coid, …);
and blocks. When the resource manager receives a packet for the client it
does a MsgReply(). My only problem now is, when I slay the send-blocked
client of the resource manager, it won’t die. It sticks around (I can see it
in pidin) until the resource manager dies. This prevents me from writing to
the executable it’s running as I get the error cp: Can’t open destination
file.: Resource busy

How can I kill the client without killing the server?

Thanks,
Shaun

Xiaodan_Tang1 · August 9, 2002, 1:58am

Shaun Jackman <sjackman@nospam.vortek.com> wrote:

I have a resource manager (it actually lives within io-net) that registered
a custom message type using message_attach(). The client does
coid = open( “/dev/io-net/vn0”, O_RDWR);
MsgSend( coid, …);
and blocks. When the resource manager receives a packet for the client it
does a MsgReply(). My only problem now is, when I slay the send-blocked
client of the resource manager, it won’t die. It sticks around (I can see it
in pidin) until the resource manager dies. This prevents me from writing to
the executable it’s running as I get the error cp: Can’t open destination
file.: Resource busy

How can I kill the client without killing the server?

You need to implment “unblock()” function in your resource manager.
Becasue before the client could die, the resource manager need to
know, and set up something so later on when it actually got that
packet (to MsgReply()), it would know “ah, this guy alread died,
don’t worry about him”.

-xtang

Shaun_Jackman · August 9, 2002, 5:16pm

I thought this might be the case, but I’m having trouble hooking into the
unblock() function. io-net automagically create the resmgr node for me
(/dev/io-net/vn0). I then use the dispatch handle that io-net provides me
with to call message_attach() so that I can receive messages from the
client. I don’t have access to the resmgr_io_funcs_t, as it’s hidden by
io-net, so I can’t override the unblock() function. In any case, since the
client is blocked on my own custom message and not on an actual resource of
the resource manager, I don’t think this is necessary anyways.

I tried using pulse_attach( …, _PULSE_CODE_UNBLOCK, …), and although it
doesn’t return an error, I never do get any UNBLOCK callbacks. I expect the
resource manager handler is using them itself.

I think I either need to coerce the io-net resource manager to unblock any
client that so asks, or use name_attach() to create an entirely new resource
for which handle the unblock request as I see fit. What’s the better
solution?

As an aside, what is the purpose of the /dev/io-net/vn0 /dev/io-net/vn_en
nodes that I own? I expect that the client should be able to
pen( “/dev/io-net/vn0”, O_RDWR) and then read() and write() the packets that
my io-net module creates.

Thanks,
Shaun

You need to implment “unblock()” function in your resource manager.
Becasue before the client could die, the resource manager need to
know, and set up something so later on when it actually got that
packet (to MsgReply()), it would know “ah, this guy alread died,
don’t worry about him”.

-xtang

I have a resource manager (it actually lives within io-net) that
registered
a custom message type using message_attach(). The client does
coid = open( “/dev/io-net/vn0”, O_RDWR);
MsgSend( coid, …);
and blocks. When the resource manager receives a packet for the client
it
does a MsgReply(). My only problem now is, when I slay the send-blocked
client of the resource manager, it won’t die. It sticks around (I can
see it
in pidin) until the resource manager dies. This prevents me from writing
to
the executable it’s running as I get the error cp: Can’t open
destination
file.: Resource busy

How can I kill the client without killing the server?

Shaun

Shaun_Jackman · August 9, 2002, 7:08pm

I’ve found a workaround. If I umount the io-net module (killing the thread
with the blocked client) the client remains REPLY blocked, but when I slay
the client it dies as expected. If I reverse this (slay the client, then
umount the io-net module) the client remains REPLY blocked and never dies
(doh!). This is a workaround though, as I’d like to be able to slay the
client without having to umount the io-net module.

Cheers,
Shaun

I have a resource manager (it actually lives within io-net) that
registered
a custom message type using message_attach(). The client does
coid = open( “/dev/io-net/vn0”, O_RDWR);
MsgSend( coid, …);
and blocks. When the resource manager receives a packet for the client it
does a MsgReply(). My only problem now is, when I slay the send-blocked
client of the resource manager, it won’t die. It sticks around (I can see
it
in pidin) until the resource manager dies. This prevents me from writing
to
the executable it’s running as I get the error cp: Can’t open destination
file.: Resource busy

How can I kill the client without killing the server?

Thanks,
Shaun

Xiaodan_Tang1 · August 10, 2002, 1:49am

Shaun Jackman <sjackman@nospam.vortek.com> wrote:

I thought this might be the case, but I’m having trouble hooking into the
unblock() function. io-net automagically create the resmgr node for me
(/dev/io-net/vn0). I then use the dispatch handle that io-net provides me
with to call message_attach() so that I can receive messages from the
client. I don’t have access to the resmgr_io_funcs_t, as it’s hidden by
io-net, so I can’t override the unblock() function. In any case, since the
client is blocked on my own custom message and not on an actual resource of
the resource manager, I don’t think this is necessary anyways.

You can always “resmgr_attach(dpp, “/dev/mydriver_service”, …)
and have your application open(”/dev/mydriver_service") and talk
to you.

It is not that obervious, but yes, you can hookup your own io
functions to /dev/io-net/vnX. You have io_net_registrant_t -
→ raw_open function.

When you regist with io-net, you tell io-net “I will handle open
request in this function”. Thus, anybody open("/dev/io-net/vn0")
or open("/dev/io-net/vn_en"), your handler function will get called.

Then in your handler, you can do “resmgr_open_bind()” to hookup
your own io functions.

-xtang

I tried using pulse_attach( …, _PULSE_CODE_UNBLOCK, …), and although it
doesn’t return an error, I never do get any UNBLOCK callbacks. I expect the
resource manager handler is using them itself.

I think I either need to coerce the io-net resource manager to unblock any
client that so asks, or use name_attach() to create an entirely new resource
for which handle the unblock request as I see fit. What’s the better
solution?

As an aside, what is the purpose of the /dev/io-net/vn0 /dev/io-net/vn_en
nodes that I own? I expect that the client should be able to
pen( “/dev/io-net/vn0”, O_RDWR) and then read() and write() the packets that
my io-net module creates.

Thanks,
Shaun

You need to implment “unblock()” function in your resource manager.
Becasue before the client could die, the resource manager need to
know, and set up something so later on when it actually got that
packet (to MsgReply()), it would know “ah, this guy alread died,
don’t worry about him”.

-xtang

I have a resource manager (it actually lives within io-net) that
registered
a custom message type using message_attach(). The client does
coid = open( “/dev/io-net/vn0”, O_RDWR);
MsgSend( coid, …);
and blocks. When the resource manager receives a packet for the client
it
does a MsgReply(). My only problem now is, when I slay the send-blocked
client of the resource manager, it won’t die. It sticks around (I can
see it
in pidin) until the resource manager dies. This prevents me from writing
to
the executable it’s running as I get the error cp: Can’t open
destination
file.: Resource busy

How can I kill the client without killing the server?

Shaun

David_Gibbs1 · August 12, 2002, 3:54pm

Shaun Jackman <sjackman@nospam.vortek.com> wrote:

I thought this might be the case, but I’m having trouble hooking into the
unblock() function. io-net automagically create the resmgr node for me
(/dev/io-net/vn0). I then use the dispatch handle that io-net provides me
with to call message_attach() so that I can receive messages from the
client. I don’t have access to the resmgr_io_funcs_t, as it’s hidden by
io-net, so I can’t override the unblock() function. In any case, since the
client is blocked on my own custom message and not on an actual resource of
the resource manager, I don’t think this is necessary anyways.

I tried using pulse_attach( …, _PULSE_CODE_UNBLOCK, …), and although it
doesn’t return an error, I never do get any UNBLOCK callbacks. I expect the
resource manager handler is using them itself.

Don’t do a pulse_attach – register an unblock handler in the callbacks
lists the same place you register, for example, your read or write
handlers.

e.g.
iofunc_func_init(…&io_funcs);
io_funcs.read = data_read;
io_funcs.unblock = client_unblock;

The framework is already capturing/grabbing that pulse, so you won’t get
it if you register for it at the pulse.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Lonnie_VanZandt1 · August 13, 2002, 3:09pm

David,

Help me recall: suppose your client lacks an unblock handler because you
were too lazy to code it. Then, when the client blocks in MsgSend because
the server dies, how does one convince the client to wake up and respond to
a SIGKILL/SIGTERM?

Lonnie.

“David Gibbs” <dagibbs@qnx.com> wrote in message
news:aj8ln6$sh2$3@nntp.qnx.com…

Shaun Jackman <> sjackman@nospam.vortek.com> > wrote:
I thought this might be the case, but I’m having trouble hooking into
the
unblock() function. io-net automagically create the resmgr node for me
(/dev/io-net/vn0). I then use the dispatch handle that io-net provides
me
with to call message_attach() so that I can receive messages from the
client. I don’t have access to the resmgr_io_funcs_t, as it’s hidden by
io-net, so I can’t override the unblock() function. In any case, since
the
client is blocked on my own custom message and not on an actual resource
of
the resource manager, I don’t think this is necessary anyways.

I tried using pulse_attach( …, _PULSE_CODE_UNBLOCK, …), and although
it
doesn’t return an error, I never do get any UNBLOCK callbacks. I expect
the
resource manager handler is using them itself.

Don’t do a pulse_attach – register an unblock handler in the callbacks
lists the same place you register, for example, your read or write
handlers.

e.g.
iofunc_func_init(…&io_funcs);
io_funcs.read = data_read;
io_funcs.unblock = client_unblock;

The framework is already capturing/grabbing that pulse, so you won’t get
it if you register for it at the pulse.

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

David_Gibbs1 · August 13, 2002, 4:55pm

Lonnie VanZandt <lonniev@predictableresponse.com> wrote:

David,

Help me recall: suppose your client lacks an unblock handler because you
were too lazy to code it. Then, when the client blocks in MsgSend because
the server dies, how does one convince the client to wake up and respond to
a SIGKILL/SIGTERM?

The client doesn’t need an unblock handler. The SERVER needs an unblock
handler.

If the client is in MsgSend() and the server dies (or the channel is
destroyed), the OS will catch this and unwind it. The MsgSend() will
return -1, and errno will be set to ESEARCH.

If the client is REPLY-block on a resmgr that does not have an unblock
handler, and that server continues to function, but not unblock the
client, then the client can NOT be killed.

(Logically, you as the server have implemented the operation for that
message send as an atomic operation, one that must not be interrupt
before completion, if you did not intend it that way, as the server
you MUST handle unblock pulses. In a resmgr you do this by registering
the unblock handler as I pointed out.)

If you are too lazy to register an unblock handler in your server, well
then you’ve written a broken piece of code, and run that broken piece of
code as root. This can cause system instability – in this case the
instability is that you have an un-killable client.

Of course, if you never leave a client REPLY blocked, that is, you always
reply to a client before returning from any registered callback (e.g.
read, or write handlers) then you don’t need an unblock handler.

-David

e.g.
iofunc_func_init(…&io_funcs);
io_funcs.read = data_read;
io_funcs.unblock = client_unblock;

The framework is already capturing/grabbing that pulse, so you won’t get
it if you register for it at the pulse.

–
QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Robert_Krten1 · August 13, 2002, 5:41pm

David Gibbs <dagibbs@qnx.com> wrote:

Lonnie VanZandt <> lonniev@predictableresponse.com> > wrote:
David,

Help me recall: suppose your client lacks an unblock handler because you
were too lazy to code it. Then, when the client blocks in MsgSend because
the server dies, how does one convince the client to wake up and respond to
a SIGKILL/SIGTERM?

The client doesn’t need an unblock handler. The SERVER needs an unblock
handler.

If the client is in MsgSend() and the server dies (or the channel is
destroyed), the OS will catch this and unwind it. The MsgSend() will
return -1, and errno will be set to ESEARCH.

If the client is REPLY-block on a resmgr that does not have an unblock
handler, and that server continues to function, but not unblock the
client, then the client can NOT be killed.

(Logically, you as the server have implemented the operation for that
message send as an atomic operation, one that must not be interrupt
before completion, if you did not intend it that way, as the server
you MUST handle unblock pulses. In a resmgr you do this by registering
the unblock handler as I pointed out.)

If you are too lazy to register an unblock handler in your server, well
then you’ve written a broken piece of code, and run that broken piece of
code as root. This can cause system instability – in this case the
instability is that you have an un-killable client.

I don’t know about this, Dave. We’ve had this conversation before

The general problem is that ANY piece of server code, regardless of whether it has
an unblock handler or not, can cause ANY client to hang by simply forgetting to reply,
with NO chance for the client to unblock. That to me is the crux of the problem,
not whether you’ve provided an unblock hanlder or not…

The standard “solution” to this, which is to say that the server does not set the
“unblock pulse required” flag in the channel creation, isn’t acceptable either
because then the server has no way of getting that information – that a client
has gone away and that it should clean up.

What’s required, and what’s been suggested, and what’s been consistently
forgotten about or ignored, is a third option – either allow a client to unblock
willy-nilly AND generate a pulse, or allow the client and server to at least
negotiate for the capability of blocking the client forever. Having the client
be at the total and complete mercy of the server has always struck me as “A Bad Thing™”

BTW, a “solution” is to kill the server. Wheee!

Of course, if you never leave a client REPLY blocked, that is, you always
reply to a client before returning from any registered callback (e.g.
read, or write handlers) then you don’t need an unblock handler.

Yup. Not always possible/desirable, though

Cheers,
-RK

-David

e.g.
iofunc_func_init(…&io_funcs);
io_funcs.read = data_read;
io_funcs.unblock = client_unblock;

The framework is already capturing/grabbing that pulse, so you won’t get
it if you register for it at the pulse.

–
QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

–
Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training and Consulting at www.parse.com.
Email my initials at parse dot com.

Colin_Burgess1 · August 13, 2002, 5:57pm

Robert Krten <nospam83@parse.com> wrote:
[snip]

BTW, a “solution” is to kill the server. Wheee!

kill -KILL 1

pidin

pid tid name prio STATE Blocked
1 1 oc/boot/procnto-32 0f READY
1 2 oc/boot/procnto-32 63r RECEIVE 1
1 3 oc/boot/procnto-32 10r RECEIVE 1
1 4 oc/boot/procnto-32 63r RECEIVE 1
1 5 oc/boot/procnto-32 10r RECEIVE 1
1 6 oc/boot/procnto-32 10r RECEIVE 1
1 7 oc/boot/procnto-32 10r RECEIVE 1
1 8 oc/boot/procnto-32 10r RUNNING
1 9 oc/boot/procnto-32 10r RECEIVE 1
1 10 oc/boot/procnto-32 10r RECEIVE 1
2 1 /boot/devc-ser8250 10r RECEIVE 1
3 1 proc/boot/slogger 10r RECEIVE 1
4 1 roc/boot/pci-malta 10r RECEIVE 1
5 1 proc/boot/io-net 10r SIGWAITINFO
5 2 proc/boot/io-net 10r RECEIVE 1
5 3 proc/boot/io-net 10r RECEIVE 1
5 4 proc/boot/io-net 10r RECEIVE 1
5 5 proc/boot/io-net 21r RECEIVE 5
5 6 proc/boot/io-net 20r RECEIVE 9
8198 1 proc/boot/devc-pty 10r RECEIVE 1
20487 1 proc/boot/fs-nfs2 10r RECEIVE 1
20487 2 proc/boot/fs-nfs2 10r RECEIVE 1
20487 3 proc/boot/fs-nfs2 10r RECEIVE 1
20487 4 proc/boot/fs-nfs2 10r RECEIVE 1
20487 5 proc/boot/fs-nfs2 10r RECEIVE 1
8200 1 proc/boot/ksh 10r SIGSUSPEND
40969 1 mipsle/bin/pidin 10r REPLY 1

Process 40969 (pidin) exited status=0.

D’oh! :v)

–
cburgess@qnx.com

David_Gibbs1 · August 13, 2002, 7:32pm

Robert Krten <nospam83@parse.com> wrote:

David Gibbs <> dagibbs@qnx.com> > wrote:
Lonnie VanZandt <> lonniev@predictableresponse.com> > wrote:
David,

If you are too lazy to register an unblock handler in your server, well
then you’ve written a broken piece of code, and run that broken piece of
code as root. This can cause system instability – in this case the
instability is that you have an un-killable client.

I don’t know about this, Dave. We’ve had this conversation before >

The general problem is that ANY piece of server code, regardless of
whether it has an unblock handler or not, can cause ANY client to
hang by simply forgetting to reply, with NO chance for the client
to unblock. That to me is the crux of the problem, not whether
you’ve provided an unblock handler or not…

I guess I don’t see it as a problem. If you write code like this,
you’ve written buggy code. This happens to be a bug that is fairly
easy to recognise when it occurs, unlike other far more subtle
possible ones. How do you deal with it? You fix the server, then
slay off & restart a new server (incidentally, freeing up the client
when you restart the server). (Why do you people keep wanting to
terminate the client, when the server is broken?)

There are lots and lots of ways a broken server can cause severe
indigestion for the client. It could MsgDeliverEvent() some other
client’s event (signal) to a client that doesn’t expect a signal.
If could reply with impossible data. And, in many ways, S/R/R is
really an RPC architecture – it makes sense that the function you
called (local or remote) has control over when it returns.

It makes even more sense, if you consider that many QNX servers
are stand-ins for code that would be written as part of the kernel
in a traditional Unix OS. If they are buggy, and don’t allow or
handle being interrupted by a signal, then your program will stay
locked in the kernel “forever”, and you can’t even slay the
server & replace it to attempt a recovery without reboot.

And, doing things this way – giving the server full control – is
needed to properly impement write() in a Posix conforming manner,
which says that, “If write() is interrupted by a signal after
it successfully writes some data, it shall return the numbers
of bytes written.”

The standard “solution” to this, which is to say that the server does
not set the “unblock pulse required” flag in the channel creation,
isn’t acceptable either because then the server has no way of getting
that information – that a client has gone away and that it should
clean up.

No, this isn’t a solution, and I’ve never suggested it as a solution.
Your server must either reply immediately, or handle unblock pulses
appropriately.

What’s required, and what’s been suggested, and what’s been consistently
forgotten about or ignored, is a third option – either allow a client
to unblock willy-nilly AND generate a pulse.

This can generate some nasty race conditions to deal with, making handling
all the cases quite messy. In particular the issue of to re-try an
operation or not when it has been interrupted by (say) a signal.
Even with properly code clients & servers, this can be messy to
handle properly.

(e.g. the operation was a write() to a serial device, client gets hit
by a signal after 50 bytes of the 150 originally requested were delivered
to the hw, server gets the pulse. client is unblocked, handles the signal,
wants to go back to sending the serial data… where does the client
re-start? On the server side…does the server finish transmitting?
Should it? Maybe the client has to ask the server how many bytes have
already been txed? As noted, this would break the implementation of the
write() call.)

Or allow the client and
server to at least negotiate for the capability of blocking the client
forever. Having the client be at the total and complete mercy of the
server has always struck me as “A Bad Thing™”

Considering that this work is handled in the kernel, leaving it to a
client-server negotiation, is going to be a bit tricky on the
implemenation side. Also, there is, yet again, the issue of
Posix conformance for things like write().

(And, this isn’t a new feature to QNX6, QNX4 had the exact same
behaviour with the PPF_SIGCATCH process flag.)

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

Robert_Krten1 · August 13, 2002, 7:46pm

David Gibbs <dagibbs@qnx.com> wrote:

Robert Krten <> nospam83@parse.com> > wrote:
David Gibbs <> dagibbs@qnx.com> > wrote:
Lonnie VanZandt <> lonniev@predictableresponse.com> > wrote:
David,

If you are too lazy to register an unblock handler in your server, well
then you’ve written a broken piece of code, and run that broken piece of
code as root. This can cause system instability – in this case the
instability is that you have an un-killable client.

I don’t know about this, Dave. We’ve had this conversation before >

The general problem is that ANY piece of server code, regardless of
whether it has an unblock handler or not, can cause ANY client to
hang by simply forgetting to reply, with NO chance for the client
to unblock. That to me is the crux of the problem, not whether
you’ve provided an unblock handler or not…

I guess I don’t see it as a problem. If you write code like this,
you’ve written buggy code. This happens to be a bug that is fairly
easy to recognise when it occurs, unlike other far more subtle
possible ones. How do you deal with it? You fix the server, then
slay off & restart a new server (incidentally, freeing up the client
when you restart the server). (Why do you people keep wanting to
terminate the client, when the server is broken?)

Classic example is the floppy disk controller. On numerous occaisons,
I’ve had “df” hang. I had no idea what it was hanging on, or why,
I just wanted to “^C” out of the client. It’s only after investigating
this, and seeing who sent who a message, that I discovered it was devb-fdc
that was the problem.

That at least answers the “why do people want to kill the client” part.

When I was in this situation, I didn’t have the source or the time to fix
devb-fdc – that adds to the perception that “something is wrong with the
system”, rather than “I have a buggy server”.

There are lots and lots of ways a broken server can cause severe
indigestion for the client. It could MsgDeliverEvent() some other
client’s event (signal) to a client that doesn’t expect a signal.
If could reply with impossible data. And, in many ways, S/R/R is
really an RPC architecture – it makes sense that the function you
called (local or remote) has control over when it returns.

It makes even more sense, if you consider that many QNX servers
are stand-ins for code that would be written as part of the kernel
in a traditional Unix OS. If they are buggy, and don’t allow or
handle being interrupted by a signal, then your program will stay
locked in the kernel “forever”, and you can’t even slay the
server & replace it to attempt a recovery without reboot.

And, doing things this way – giving the server full control – is
needed to properly impement write() in a Posix conforming manner,
which says that, “If write() is interrupted by a signal after
it successfully writes some data, it shall return the numbers
of bytes written.”

Don’t get me started on signals

The standard “solution” to this, which is to say that the server does
not set the “unblock pulse required” flag in the channel creation,
isn’t acceptable either because then the server has no way of getting
that information – that a client has gone away and that it should
clean up.

No, this isn’t a solution, and I’ve never suggested it as a solution.
Your server must either reply immediately, or handle unblock pulses
appropriately.

Sorry, didn’t mean to imply you stated this as a solution; currently,
it’s the only available “solution” and I wanted to nip that particular
thread in the bud.

What’s required, and what’s been suggested, and what’s been consistently
forgotten about or ignored, is a third option – either allow a client
to unblock willy-nilly AND generate a pulse.

This can generate some nasty race conditions to deal with, making handling
all the cases quite messy. In particular the issue of to re-try an
operation or not when it has been interrupted by (say) a signal.
Even with properly code clients & servers, this can be messy to
handle properly.

(e.g. the operation was a write() to a serial device, client gets hit
by a signal after 50 bytes of the 150 originally requested were delivered
to the hw, server gets the pulse. client is unblocked, handles the signal,
wants to go back to sending the serial data… where does the client
re-start? On the server side…does the server finish transmitting?
Should it? Maybe the client has to ask the server how many bytes have
already been txed? As noted, this would break the implementation of the
write() call.)

The current way with the _MI_UNBLOCK_REQ flag and all that stuff is
probably as complicated

Or allow the client and
server to at least negotiate for the capability of blocking the client
forever. Having the client be at the total and complete mercy of the
server has always struck me as “A Bad Thing™”

Considering that this work is handled in the kernel, leaving it to a
client-server negotiation, is going to be a bit tricky on the
implemenation side. Also, there is, yet again, the issue of
Posix conformance for things like write().

(And, this isn’t a new feature to QNX6, QNX4 had the exact same
behaviour with the PPF_SIGCATCH process flag.)

Perhaps we could modify the “request” a little bit. Suppose that I said
that it works the way that it works today for everything except the
unblockable “you must die now” signal on the client.

From a functional point of view, that would solve all the issues, wouldn’t it?
The return value from write() would be immaterial.
The client would die.
The server would still get a pulse and be allowed to clean up (and, the server
would get a synthetic _IO_CLOSE later anyway).

Cheers,
-RK

–
Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training and Consulting at www.parse.com.
Email my initials at parse dot com.

David_Gibbs1 · August 13, 2002, 9:04pm

Robert Krten <nospam83@parse.com> wrote:

David Gibbs <> dagibbs@qnx.com> > wrote:
Robert Krten <> nospam83@parse.com> > wrote:
David Gibbs <> dagibbs@qnx.com> > wrote:
Lonnie VanZandt <> lonniev@predictableresponse.com> > wrote:
David,

I guess I don’t see it as a problem. If you write code like this,
you’ve written buggy code. This happens to be a bug that is fairly
easy to recognise when it occurs, unlike other far more subtle
possible ones. How do you deal with it? You fix the server, then
slay off & restart a new server (incidentally, freeing up the client
when you restart the server). (Why do you people keep wanting to
terminate the client, when the server is broken?)

Classic example is the floppy disk controller. On numerous occaisons,
I’ve had “df” hang. I had no idea what it was hanging on, or why,
I just wanted to “^C” out of the client. It’s only after investigating
this, and seeing who sent who a message, that I discovered it was devb-fdc
that was the problem.

Oooh… floppy drives. The x86 floppy controller is a horrible, evil,
ugly hack misbegotten child of an miserable piece of disgusting hardware.
Dealing with it is painful, and you should never have/depend on one of
them in a realtime or dependable system if you can, in anyway, avoid it.

That given, not replying to a client or catching the unblock pulse in
a “reasonable” amount of time is unfriendly behaviour for the floppy
driver. Have you submitted a PR on this at bugs.qnx.com?

That at least answers the “why do people want to kill the client” part.

When I was in this situation, I didn’t have the source or the time to fix
devb-fdc – that adds to the perception that “something is wrong with the
system”, rather than “I have a buggy server”.

True, there can be perception problems. I try education when that happens.

(And, there are far too many, “the system is buggy” – no really, you’ve
just written bad code issues to address all of them as bug fixes. e.g.
thread 1 creates a higher priority thread where the 2nd thread goes
into a READY loop. Now, try killing this process. Hit it with whatever
signal you want – it ain’t gonna die.)

And, doing things this way – giving the server full control – is
needed to properly impement write() in a Posix conforming manner,
which says that, “If write() is interrupted by a signal after
it successfully writes some data, it shall return the numbers
of bytes written.”

Don’t get me started on signals <grin

Signals do have their messiness.

(e.g. the operation was a write() to a serial device, client gets hit
by a signal after 50 bytes of the 150 originally requested were delivered
to the hw, server gets the pulse. client is unblocked, handles the signal,
wants to go back to sending the serial data… where does the client
re-start? On the server side…does the server finish transmitting?
Should it? Maybe the client has to ask the server how many bytes have
already been txed? As noted, this would break the implementation of the
write() call.)

The current way with the _MI_UNBLOCK_REQ flag and all that stuff is
probably as complicated >

That “_MI_UNBLOCK_REQ” messiness is:

only on the server side
only needed by a multi-threaded server where one thread might
be getting the unblock pulse, while another is replying
and still allows implementation of Posix compliant behaviour.

Perhaps we could modify the “request” a little bit. Suppose that I said
that it works the way that it works today for everything except the
unblockable “you must die now” signal on the client.

From a functional point of view, that would solve all the issues,
wouldn’t it?
The return value from write() would be immaterial.
The client would die.
The server would still get a pulse and be allowed to clean up
(and, the server would get a synthetic _IO_CLOSE later anyway).

That is an interesting idea. But, I don’t think it could be implemented
without a lot of care, and without writing a lot of regression tests, to
make sure it doesn’t break any Posix mandated behaviour. Would have to
be pretty careful in implementing, too, to not break the appropriate
interaction between SIGSTOP & SIGKILL. (i.e. STOPPED process won’t die
from even SIGKILL until you hit it with SIGCONT.)

-David

QNX Training Services
http://www.qnx.com/support/training/
Please followup in this newsgroup if you have further questions.

system · August 14, 2002, 6:24pm

[Rob said, then Dave said, then Rob said, then Dave said snipped]

There is actually quite a bit of thought about this topic
in the OS group and several ideas have been floated about
how to improve this behaviour to provide both the ‘reliability
and conformance’ as well as the expected response.

It is in the works, but when it will finally appear is
difficult to say.

Thomas

Thomas (toe-mah) Fletcher QNX Software Systems
thomasf@qnx.com Core OS Technology Group
(613)-591-0931 http://www.qnx.com/

Robert_Krten1 · August 14, 2002, 7:05pm

thomasf@qnx.com wrote:

[Rob said, then Dave said, then Rob said, then Dave said snipped]

There is actually quite a bit of thought about this topic
in the OS group and several ideas have been floated about
how to improve this behaviour to provide both the ‘reliability
and conformance’ as well as the expected response.

It is in the works, but when it will finally appear is
difficult to say.

Any chance you might be able to share? Even if they’re just
“not commited 100%” types of ideas…
I’m curious to hear the solution!
(Plus, this way you’ll get my bitching and whining out of
the way before the new method hits the streets )

Cheers,
-RK

Thomas

Thomas (toe-mah) Fletcher QNX Software Systems
thomasf@qnx.com > Core OS Technology Group
(613)-591-0931 > http://www.qnx.com/

–
Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training and Consulting at www.parse.com.
Email my initials at parse dot com.