David Gibbs <dagibbs@qnx.com> wrote:
Robert Krten <> nospam83@parse.com> > wrote:
David Gibbs <> dagibbs@qnx.com> > wrote:
Lonnie VanZandt <> lonniev@predictableresponse.com> > wrote:
David,
If you are too lazy to register an unblock handler in your server, well
then you’ve written a broken piece of code, and run that broken piece of
code as root. This can cause system instability – in this case the
instability is that you have an un-killable client.
I don’t know about this, Dave. We’ve had this conversation before >
The general problem is that ANY piece of server code, regardless of
whether it has an unblock handler or not, can cause ANY client to
hang by simply forgetting to reply, with NO chance for the client
to unblock. That to me is the crux of the problem, not whether
you’ve provided an unblock handler or not…
I guess I don’t see it as a problem. If you write code like this,
you’ve written buggy code. This happens to be a bug that is fairly
easy to recognise when it occurs, unlike other far more subtle
possible ones. How do you deal with it? You fix the server, then
slay off & restart a new server (incidentally, freeing up the client
when you restart the server). (Why do you people keep wanting to
terminate the client, when the server is broken?)
Classic example is the floppy disk controller. On numerous occaisons,
I’ve had “df” hang. I had no idea what it was hanging on, or why,
I just wanted to “^C” out of the client. It’s only after investigating
this, and seeing who sent who a message, that I discovered it was devb-fdc
that was the problem.
That at least answers the “why do people want to kill the client” part.
When I was in this situation, I didn’t have the source or the time to fix
devb-fdc – that adds to the perception that “something is wrong with the
system”, rather than “I have a buggy server”.
There are lots and lots of ways a broken server can cause severe
indigestion for the client. It could MsgDeliverEvent() some other
client’s event (signal) to a client that doesn’t expect a signal.
If could reply with impossible data. And, in many ways, S/R/R is
really an RPC architecture – it makes sense that the function you
called (local or remote) has control over when it returns.
It makes even more sense, if you consider that many QNX servers
are stand-ins for code that would be written as part of the kernel
in a traditional Unix OS. If they are buggy, and don’t allow or
handle being interrupted by a signal, then your program will stay
locked in the kernel “forever”, and you can’t even slay the
server & replace it to attempt a recovery without reboot.
And, doing things this way – giving the server full control – is
needed to properly impement write() in a Posix conforming manner,
which says that, “If write() is interrupted by a signal after
it successfully writes some data, it shall return the numbers
of bytes written.”
Don’t get me started on signals
The standard “solution” to this, which is to say that the server does
not set the “unblock pulse required” flag in the channel creation,
isn’t acceptable either because then the server has no way of getting
that information – that a client has gone away and that it should
clean up.
No, this isn’t a solution, and I’ve never suggested it as a solution.
Your server must either reply immediately, or handle unblock pulses
appropriately.
Sorry, didn’t mean to imply you stated this as a solution; currently,
it’s the only available “solution” and I wanted to nip that particular
thread in the bud.
What’s required, and what’s been suggested, and what’s been consistently
forgotten about or ignored, is a third option – either allow a client
to unblock willy-nilly AND generate a pulse.
This can generate some nasty race conditions to deal with, making handling
all the cases quite messy. In particular the issue of to re-try an
operation or not when it has been interrupted by (say) a signal.
Even with properly code clients & servers, this can be messy to
handle properly.
(e.g. the operation was a write() to a serial device, client gets hit
by a signal after 50 bytes of the 150 originally requested were delivered
to the hw, server gets the pulse. client is unblocked, handles the signal,
wants to go back to sending the serial data… where does the client
re-start? On the server side…does the server finish transmitting?
Should it? Maybe the client has to ask the server how many bytes have
already been txed? As noted, this would break the implementation of the
write() call.)
The current way with the _MI_UNBLOCK_REQ flag and all that stuff is
probably as complicated
Or allow the client and
server to at least negotiate for the capability of blocking the client
forever. Having the client be at the total and complete mercy of the
server has always struck me as “A Bad Thing™”
Considering that this work is handled in the kernel, leaving it to a
client-server negotiation, is going to be a bit tricky on the
implemenation side. Also, there is, yet again, the issue of
Posix conformance for things like write().
(And, this isn’t a new feature to QNX6, QNX4 had the exact same
behaviour with the PPF_SIGCATCH process flag.)
Perhaps we could modify the “request” a little bit. Suppose that I said
that it works the way that it works today for everything except the
unblockable “you must die now” signal on the client.
From a functional point of view, that would solve all the issues, wouldn’t it?
The return value from write() would be immaterial.
The client would die.
The server would still get a pulse and be allowed to clean up (and, the server
would get a synthetic _IO_CLOSE later anyway).
Cheers,
-RK
–
Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Books, Video-based and Instructor-led
Training and Consulting at www.parse.com.
Email my initials at parse dot com.