Fsys priority

A cow-orker of mine came across the following, using QNX 4.24:

His system has to periodically gunzip a file, to produce a very large (>100
mbytes) file. He does this using the system() call, at a low priority.
gunzip does run at the low priority, but he found that processes at a higher
priority would still occasionally pause for as much as 100 milliseconds or
more, until the entire gunzip finished. This would cause these processes to
miss deadlines.

Upon investigation, it appeared that the problem was that Fsys was using up
the CPU at high priority, most likely when flushing the cache when it became
full. To test this, he wrote a small utility that took the stdout of gunzip
and wrote it to disk in small chunks, using synchronous writes. This should
prevent the cache from ever filling up. Sure enough, this reduced the delays
to below 10 msecs, which were manageable.

Does anyone have a better explanation for what was going on, and perhaps a
better workaround? The same issue seems to be present in 4.25, but I haven’t
spent much time trying to verify it.

Any comments would be appreciated.

Thanks,

Kevin

Kevin Miller <kevin.miller@transcore.com> wrote:

Upon investigation, it appeared that the problem was that Fsys was using up
the CPU at high priority, most likely when flushing the cache when it became

Close enough. The draining of the cache is triggered by the “IO done”
proxy of the Fsys driver, which comes in at 22. The driver itself runs
at a fixed 22. So draining the write-behinds happens at pri 22 also.
This can be a fair amount of CPU work if non-DMA is being used.

gunzip does run at the low priority, but he found that processes at a higher
priority would still occasionally pause for as much as 100 milliseconds or

This seems like a bad system design; if a process has hard deadlines then
it should be given an elevated priority above things like disk and network
(eg > 22) to ensure that it runs when needed.

better workaround? The same issue seems to be present in 4.25, but I
haven’t spent much time trying to verify it.

This won’t be changed for QNX4. In QNX6 the write-behinds retain their
original client priority and so are processed through the filesystem
layers at the expected level (however the disk driver thread still runs
at a fixed 22 priority, although with DMA there is less it has to do).

Unfortunately, we can’t change priorities, since all tasks run at the same
priority to effect mutual exclusion from shared memory. This an old system
and certain design decisions were suspect, as you point out. At least you’ve
confirmed what we suspected, and I think we can live with the workaround we
have.

Thanks,

Kevin

“John Garvey” <jgarvey@qnx.com> wrote in message
news:b8toss$r34$1@nntp.qnx.com

Kevin Miller <> kevin.miller@transcore.com> > wrote:
Upon investigation, it appeared that the problem was that Fsys was using
up
the CPU at high priority, most likely when flushing the cache when it
became

Close enough. The draining of the cache is triggered by the “IO done”
proxy of the Fsys driver, which comes in at 22. The driver itself runs
at a fixed 22. So draining the write-behinds happens at pri 22 also.
This can be a fair amount of CPU work if non-DMA is being used.

gunzip does run at the low priority, but he found that processes at a
higher
priority would still occasionally pause for as much as 100 milliseconds
or

This seems like a bad system design; if a process has hard deadlines then
it should be given an elevated priority above things like disk and network
(eg > 22) to ensure that it runs when needed.

better workaround? The same issue seems to be present in 4.25, but I
haven’t spent much time trying to verify it.

This won’t be changed for QNX4. In QNX6 the write-behinds retain their
original client priority and so are processed through the filesystem
layers at the expected level (however the disk driver thread still runs
at a fixed 22 priority, although with DMA there is less it has to do).

“Kevin Miller” <kevin.miller@transcore.com> wrote in message
news:b8tqth$si8$1@nntp.qnx.com

Unfortunately, we can’t change priorities, since all tasks run at the same
priority to effect mutual exclusion from shared memory.

Then raise them all :wink:

Other option is to decrease cache size (this will probably hurt performance)

The program that writes data in smaller block could use special option
(DSYNC from memory, check doc of open() ) to bypass cache as much as
possible.

This an old system
and certain design decisions were suspect, as you point out. At least
you’ve
confirmed what we suspected, and I think we can live with the workaround
we
have.

Thanks,

Kevin

“John Garvey” <> jgarvey@qnx.com> > wrote in message
news:b8toss$r34$> 1@nntp.qnx.com> …
Kevin Miller <> kevin.miller@transcore.com> > wrote:
Upon investigation, it appeared that the problem was that Fsys was
using
up
the CPU at high priority, most likely when flushing the cache when it
became

Close enough. The draining of the cache is triggered by the “IO done”
proxy of the Fsys driver, which comes in at 22. The driver itself runs
at a fixed 22. So draining the write-behinds happens at pri 22 also.
This can be a fair amount of CPU work if non-DMA is being used.

gunzip does run at the low priority, but he found that processes at a
higher
priority would still occasionally pause for as much as 100
milliseconds
or

This seems like a bad system design; if a process has hard deadlines
then
it should be given an elevated priority above things like disk and
network
(eg > 22) to ensure that it runs when needed.

better workaround? The same issue seems to be present in 4.25, but I
haven’t spent much time trying to verify it.

This won’t be changed for QNX4. In QNX6 the write-behinds retain their
original client priority and so are processed through the filesystem
layers at the expected level (however the disk driver thread still runs
at a fixed 22 priority, although with DMA there is less it has to do).