fsys priority inversion

Robert_Krten1 · February 2, 2004, 2:58pm

Is this a known problem, does it have a known solution?

Using the latest diskscrubber, I run it at priority 6 and my system
becomes unusable. Any other disk accesses take seconds (if not
minutes) to resolve.

Basically, in a nutshell, fsys appears to be servicing low priority
requests from a client to the exclusion/detriment of higher priority
clients – your basic priority inversion.

The diskscrubber (http://www.parse.com/samples/manpages/blockscrubber.html)
basically does an ftruncate() to create a number 1 GB files, and then
writes() to them. Dead simple.

Cheers,
-RK

–
[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

Bill_Caroselli1 · February 2, 2004, 9:35pm

Robert Krten <rk@parse.com> wrote:
RK > Is this a known problem, does it have a known solution?

RK > Using the latest diskscrubber, I run it at priority 6 and my system
RK > becomes unusable. Any other disk accesses take seconds (if not
RK > minutes) to resolve.

RK > Basically, in a nutshell, fsys appears to be servicing low priority
RK > requests from a client to the exclusion/detriment of higher priority
RK > clients – your basic priority inversion.

RK > The diskscrubber (http://www.parse.com/samples/manpages/blockscrubber.html)
RK > basically does an ftruncate() to create a number 1 GB files, and then
RK > writes() to them. Dead simple.

RK > Cheers,
RK > -RK

Isn’t pregrowing a file via ftruncate() an atomic (and time consuming)
operation?

Robert_Krten1 · February 2, 2004, 9:58pm

Bill Caroselli <qtps@earthlink.net> wrote:

Robert Krten <> rk@parse.com> > wrote:
RK > Is this a known problem, does it have a known solution?

RK > Using the latest diskscrubber, I run it at priority 6 and my system
RK > becomes unusable. Any other disk accesses take seconds (if not
RK > minutes) to resolve.

RK > Basically, in a nutshell, fsys appears to be servicing low priority
RK > requests from a client to the exclusion/detriment of higher priority
RK > clients – your basic priority inversion.

RK > The diskscrubber (> http://www.parse.com/samples/manpages/blockscrubber.html> )
RK > basically does an ftruncate() to create a number 1 GB files, and then
RK > writes() to them. Dead simple.

RK > Cheers,
RK > -RK

Isn’t pregrowing a file via ftruncate() an atomic (and time consuming)
operation?

I’m not sure it has to be (atomic) – it can be done in pieces, at a lower
priority, and only when/if the file is the right size (or we run out of
disk space) does the call return – atomic “enough” for our purposes.

However, that wasn’t even the major problem; the real “stalling” and
priority inversion happened during the write()s.

Cheers,
-RK

–
[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

Robert_Krten1 · February 4, 2004, 5:43pm

Ping?

Robert Krten <rk@parse.com> wrote:

Is this a known problem, does it have a known solution?

Using the latest diskscrubber, I run it at priority 6 and my system
becomes unusable. Any other disk accesses take seconds (if not
minutes) to resolve.

Basically, in a nutshell, fsys appears to be servicing low priority
requests from a client to the exclusion/detriment of higher priority
clients – your basic priority inversion.

The diskscrubber (> http://www.parse.com/samples/manpages/blockscrubber.html> )
basically does an ftruncate() to create a number 1 GB files, and then
writes() to them. Dead simple.

Cheers,
-RK

–
[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector > http://www.parse.com/~pdp8/

–
[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

John_Garvey1 · February 4, 2004, 8:29pm

Robert Krten <rk@parse.com> wrote:

Ping?

We are all, um, somewhat busy with 6.3. And you didn’t provide
any detailed information, such as a pidin.

Is this a known problem, does it have a known solution?

I can think of a number of issues. The disk driver component
runs at a fixed 21 priority, hence communication with the h/w
does not respect client priority (this should be fairly short
except in non-DMA modes); hard realtime processes should be
run above that priority. I don’t think the kernel supports
chains of priority inheritence (maybe it does now for mutexes?),
so CONDVAR blocking doesn’t boost; furthermore every resource
would need a kernel synchronisation object to get such priority
inheritance (eg every disk block), which in the QNX physical
scheme would not be practical (cf Solaris turnstile pool);
similarly having the application maintain this knowledge and
do its own priority manipulations is expensive in kernel calls.
So whilst the filesystem goes to some trouble to ensure it is
always running at client priority, the inability to associate
a mutex with every potential resource limits inheritence scope.
Since you are writing many GBs, this will fill your buffer cache
quicker than physical disk can be performed, so any other IO
will have to wait for the delayed-write (it will be initiated
sooner though in such a situation). Have you tried specifying
the “blk wipe=” option to limit this cache wiping, or have you
tried using O_SYNC on your scrubbing files which will pull them
into lock-step with the slower physical IO? Does some filesystem
operation that does not require disk IO (perhaps a “df /”)
complete immediately?

Robert_Krten1 · February 5, 2004, 2:43pm

John Garvey <jgarvey@qnx.com> wrote:

Robert Krten <> rk@parse.com> > wrote:
Ping?

We are all, um, somewhat busy with 6.3. And you didn’t provide
any detailed information, such as a pidin.

Yes, should have realized about the 6.3 Sorry about that!
I’ll get a pidin together…

Is this a known problem, does it have a known solution?

I can think of a number of issues. The disk driver component
runs at a fixed 21 priority, hence communication with the h/w
does not respect client priority (this should be fairly short
except in non-DMA modes); hard realtime processes should be
run above that priority. I don’t think the kernel supports
chains of priority inheritence (maybe it does now for mutexes?),
so CONDVAR blocking doesn’t boost; furthermore every resource
would need a kernel synchronisation object to get such priority
inheritance (eg every disk block), which in the QNX physical
scheme would not be practical (cf Solaris turnstile pool);
similarly having the application maintain this knowledge and
do its own priority manipulations is expensive in kernel calls.
So whilst the filesystem goes to some trouble to ensure it is
always running at client priority, the inability to associate
a mutex with every potential resource limits inheritence scope.
Since you are writing many GBs, this will fill your buffer cache
quicker than physical disk can be performed, so any other IO
will have to wait for the delayed-write (it will be initiated
sooner though in such a situation). Have you tried specifying
the “blk wipe=” option to limit this cache wiping, or have you
tried using O_SYNC on your scrubbing files which will pull them
into lock-step with the slower physical IO? Does some filesystem
operation that does not require disk IO (perhaps a “df /”)
complete immediately?

I’ll try doing the O_SYNC (which I really should do for small files
anyway, otherwise I’m just scrubbing the cache and not the disk), and
I’ll put in a delay (1) or somesuch into the logic on the client side.

Appreciate the answers, John.

Cheers,
-RK

\

[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

Robert_Krten1 · February 5, 2004, 3:36pm

Robert Krten <rk@parse.com> wrote:

John Garvey <> jgarvey@qnx.com> > wrote:
Robert Krten <> rk@parse.com> > wrote:
Ping?

We are all, um, somewhat busy with 6.3. And you didn’t provide
any detailed information, such as a pidin.

Yes, should have realized about the 6.3 > > Sorry about that!
I’ll get a pidin together…

Is this a known problem, does it have a known solution?

I can think of a number of issues. The disk driver component
runs at a fixed 21 priority, hence communication with the h/w
does not respect client priority (this should be fairly short
except in non-DMA modes); hard realtime processes should be
run above that priority. I don’t think the kernel supports
chains of priority inheritence (maybe it does now for mutexes?),
so CONDVAR blocking doesn’t boost; furthermore every resource
would need a kernel synchronisation object to get such priority
inheritance (eg every disk block), which in the QNX physical
scheme would not be practical (cf Solaris turnstile pool);
similarly having the application maintain this knowledge and
do its own priority manipulations is expensive in kernel calls.
So whilst the filesystem goes to some trouble to ensure it is
always running at client priority, the inability to associate
a mutex with every potential resource limits inheritence scope.
Since you are writing many GBs, this will fill your buffer cache
quicker than physical disk can be performed, so any other IO
will have to wait for the delayed-write (it will be initiated
sooner though in such a situation). Have you tried specifying
the “blk wipe=” option to limit this cache wiping, or have you
tried using O_SYNC on your scrubbing files which will pull them
into lock-step with the slower physical IO? Does some filesystem
operation that does not require disk IO (perhaps a “df /”)
complete immediately?

I’ll try doing the O_SYNC (which I really should do for small files
anyway, otherwise I’m just scrubbing the cache and not the disk), and
I’ll put in a delay (1) or somesuch into the logic on the client side.

I put in the O_SYNC, and a delay(10) and now everything works much
better, thanks again, John.

Cheers,
-RK

–
[If replying via email, you’ll need to click on the URL that’s emailed to you
afterwards to forward the email to me – spam filters and all that]
Robert Krten, PDP minicomputer collector http://www.parse.com/~pdp8/

fsys priority inversion

Cheers, -RK \

Cheers,
-RK

\