Thanks both to Sam and Mario. You’ve given me lots of good information in my
struggle to become a unix programmer. Have either of you considered writing a
book? You have at least one guaranteed customer right here.
So an “interrupt” boils down to either some minor function context switch to execute
the handler, or it interrupts the io in a way that can’t be restored, and returns
the EINTR errno. I shouldn’t used fwrite. I can use printf associated with signals
as long as I am not interrupting a printf with a printf, or more specifically,
clobbering the same global stdio space.
My signal test here was just to see what the results would be as the “advanced” book
says that it is dependent on the implementation of fwrite. IE it differs from
compiler to compiler (or c library to c library).
The only use of a signal in the process with the problem is a sigsuspend on SIGUSR2
to hibernate the process when it is done processing, and to wake it again, when data
is put on its queue for processing. There is no explicit handler. Another process
sets SIGUSR2. A process table keeps track of who it was set for and awakes the
proper process. It appears this was a way to port the VMS method of hibernating
code over to QNX.
I’m going to wait for one more occurence of the problem, now that we have addition
diags in, and if I can’t figure out where the problem is, I’ll take your advice and
rewrite using write().
Previously, J. Scott Franko wrote in qdn.public.qnx4:
Sam Roberts wrote:
Previously, J. Scott Franko wrote in qdn.public.qnx4:
What happens when fwrite doesn’t complete? How do you recover? Say I
ask fwrite to write 1 object, and it returns a value other than 1.
According to the documents, a return less than the number of objects
requested is an error. What happens. Does it write a part of the
object to the disk, thus corrupting the file? Or does it write nothing?
I’ve found that it appears to compete ok, even when interrupted by a
signal, and it doesn’t return EINTR in errno.
How do you know you are interrupting fwrite()? A C function call isn’t
interruptible, only system calls are. fwrite() sometimes makes a system
call (write), and sometimes doesn’t. I’m curious what kind of test you’ve
done that you think is interrupting fwrite().
I wrote a small test program. Made a big array of structures (several MB’s),
and then set up a handler
for SIGALRM, set a 10 second timer with alarm(), write the size of the the
whole arrary using printf, called fwrite using size of each array element
structure (which Mario helped me correct; see previous fwrite thread by me),
and sent several array elements, then write the completion errno and status,
and close the file.
Meanwhile the fwrite takes long enough that the signal goes off before the
fwrite completes, and the signal handler has a printf in it. I know that the
signal goes off before fwrite is done, because its printf output comes out
before the completion message with the errno and status from fwrite, which
directly follows fwrite. fwrite returns the same number of elements as I
requested, and then I ls -al the file and see that its byte size is the same
as the byte size of the array elements.
So even though fwrite doesn’t appear to be re-entrant from looking in the
Signals chapter of “Advanced Programming in the Unix Environment”, it does
re-enter, pick up where it left off. I’ve varried the number of elements in
This is not re-entering! In your printf() you are accessing global data
(stdout). In your fwrite() you are also accessing global data (but it’s
a FILE that you opened). Since they are acessing different global data, you
are ok. If you were in the middle of a printf(), and called printf() from
a signal handler at one of the critical times, stdout would be corrupted.
This is what they mean by non-reentrant.
the array, and wrote it many times, all with the same good results. Also,
from reading the Advanced book, I didn’t an indication that signals only
interrupt system calls. I therefore assumed that signals asynchronously
interrupt any part of the code, unless you specifically set up to ignore them.
“Interrupt” has a precise meaning, you’re using it too loosely, though I
can understand why.
“interrupt” and “occur during” are not the same. During fwrite() there are
some times that the fwrite() is doing a memcpy() into a buffer, and sometimes
that it does a Send() of some data to Fsys. When a signal occurs during the
memory copy the kernel arranges for the signal handler to be called and
executed by the process immediately. This doesn’t take too much work. When
the signal occurs during the Send(), the process is blocked, it can’t
execute code while its blocked, so it has to break out of the Send(). The
exact mechanism varies on Unix and on QNX, because of the message passing
in QNX, but the effect is the same. This breaking out is done in a way
that the process can’t just magically go back to Send() blocked. This breaking
of the Send() is what is the “interruption” caused by a signal, and is VERY
different from the “immediately called” action that takes place when a signal
occurs during normal process execution.
So what I’m saying is that some proportion of the fwrite() of megabytes is
spent in-process copying memory, and some proportion is spent in a Send().
Interruption only occurs if the signal occurs during the Send().
I looked at the BSD source, and it appears to me that if fwrite() returns
a number of complete objects that is less than what you asked for,
then it is possible for a partial object to have been written.
If this is so, how do I recover? How can I take back the partial object it
has written, and rewrite the complete object.
I don’t think there is a portable way. I’m not kidding when I claim signals
are a problem under Unix, its very unfortuneate the s/w you are maintaining
uses them so much, they are inherently unpredictable, and subtly timing
changes can uncover long-dormant bugs.
By system calls, you mean the posix, write and read? If I used them, I have
to roll my own object write? There is no equivalent to fwrite in the system
calls, right?
Look at the docs for write(). fwrite() is a cheesy wrapper on write, it is
exactly equivalent, except write() has a useful return value, and fwrite()
multiplies two numbers for you!
–
Sam Roberts (> sam@cogent.ca> ), Cogent Real-Time Systems (> www.cogent.ca> )