File table overflow

Hello all,

A college of mine at work is pulling his hair out due to a peculiar
problem. He’s building a Ctree based application that adds and removes
records. So far nothing new.

But occasionally the system gets in a strange state where practically
all file descriptors gets used up system wide. You can hardly start up
any other utils like “ls” or “sin”. Even more strange is that when you
run “sin files” or “sin fd” you get a normal fd usage. You don’t see a
large number of files open, as you would expect. Rebooting the system
remedies the situation.

My question:

Can there be files open on a system which cannot be displayed with “sin
fd” or “sin files”? (“Invisible” open files?)

Is it at all possible that files could get permanently left open due to
very abnormal process termination. (I.e. it didn’t get a chance to
execute its termination thread properly.)

Is there any other diagnostic tool to inspect the file table usage (or
any other relevant resource) besides “sin”?

ARGA (Any Response Greatly Appreciated)

rick

“Rick Lake” <rwlake@SPAM.REDIRECTED.TO.DEV.NULL> wrote in message
news:3A37E3F7.34817B22@SPAM.REDIRECTED.TO.DEV.NULL

Hello all,

A college of mine at work is pulling his hair out due to a peculiar
problem. He’s building a Ctree based application that adds and removes
records. So far nothing new.

But occasionally the system gets in a strange state where practically
all file descriptors gets used up system wide. You can hardly start up
any other utils like “ls” or “sin”. Even more strange is that when you
run “sin files” or “sin fd” you get a normal fd usage. You don’t see a
large number of files open, as you would expect. Rebooting the system
remedies the situation.

My question:

Can there be files open on a system which cannot be displayed with “sin
fd” or “sin files”? (“Invisible” open files?)

Not that I’m aware of. However make sure you have the latest 4.25D.
I don’t remember when (early 4.25?) but there was a bug in QNX that
would cause this problem.

Is it at all possible that files could get permanently left open due to
very abnormal process termination. (I.e. it didn’t get a chance to
execute its termination thread properly.)

Is there any other diagnostic tool to inspect the file table usage (or
any other relevant resource) besides “sin”?

ARGA (Any Response Greatly Appreciated)

rick

Mario Charest wrote:

“Rick Lake” <> rwlake@SPAM.REDIRECTED.TO.DEV.NULL> > wrote in message
news:> 3A37E3F7.34817B22@SPAM.REDIRECTED.TO.DEV.NULL> …
Hello all,

A college of mine at work is pulling his hair out due to a peculiar
problem. He’s building a Ctree based application that adds and removes
records. So far nothing new.

But occasionally the system gets in a strange state where practically
all file descriptors gets used up system wide. You can hardly start up
any other utils like “ls” or “sin”. Even more strange is that when you
run “sin files” or “sin fd” you get a normal fd usage. You don’t see a
large number of files open, as you would expect. Rebooting the system
remedies the situation.

My question:

Can there be files open on a system which cannot be displayed with “sin
fd” or “sin files”? (“Invisible” open files?)


Not that I’m aware of. However make sure you have the latest 4.25D.
I don’t remember when (early 4.25?) but there was a bug in QNX that
would cause this problem.

fsysinfo will show Fsys limits - osinfo will also show Proc’s
fd limits.
How does he know that it’s file descriptors that are used up?
If he is getting an EMFILE error and the disk is big then
it’s an Fsys bug which can be fixed by a magic John Garvey
incantation: His suggestion for a 28G drive was to add “-H86016”
to Fsys’s command line.
Prior to 4.23, there was a “Heapf” in Proc that was limited in size.

Is it at all possible that files could get permanently left open due to
very abnormal process termination. (I.e. it didn’t get a chance to
execute its termination thread properly.)

Is there any other diagnostic tool to inspect the file table usage (or
any other relevant resource) besides “sin”?

ARGA (Any Response Greatly Appreciated)

rick

“Richard R. Kramer” wrote:

Mario Charest wrote:

“Rick Lake” <> rwlake@SPAM.REDIRECTED.TO.DEV.NULL> > wrote in message
news:> 3A37E3F7.34817B22@SPAM.REDIRECTED.TO.DEV.NULL> …
Hello all,

A college of mine at work is pulling his hair out due to a peculiar
problem. He’s building a Ctree based application that adds and removes
records. So far nothing new.

But occasionally the system gets in a strange state where practically
all file descriptors gets used up system wide. You can hardly start up
any other utils like “ls” or “sin”. Even more strange is that when you
run “sin files” or “sin fd” you get a normal fd usage. You don’t see a
large number of files open, as you would expect. Rebooting the system
remedies the situation.

My question:

Can there be files open on a system which cannot be displayed with “sin
fd” or “sin files”? (“Invisible” open files?)


Not that I’m aware of. However make sure you have the latest 4.25D.
I don’t remember when (early 4.25?) but there was a bug in QNX that
would cause this problem.

fsysinfo will show Fsys limits - osinfo will also show Proc’s
fd limits.
How does he know that it’s file descriptors that are used up?
If he is getting an EMFILE error and the disk is big then
it’s an Fsys bug which can be fixed by a magic John Garvey
incantation: His suggestion for a 28G drive was to add “-H86016”
to Fsys’s command line.
Prior to 4.23, there was a “Heapf” in Proc that was limited in size.

He’s getting ENFILE (errno 23: file table overflow)
Wouldn’t this be a Proc issue?

Is it at all possible that files could get permanently left open due to
very abnormal process termination. (I.e. it didn’t get a chance to
execute its termination thread properly.)

Is there any other diagnostic tool to inspect the file table usage (or
any other relevant resource) besides “sin”?

ARGA (Any Response Greatly Appreciated)

rick

Rick Lake wrote:
[snip]

“Richard R. Kramer” wrote:

Mario Charest wrote:


Not that I’m aware of. However make sure you have the latest 4.25D.
I don’t remember when (early 4.25?) but there was a bug in QNX that
would cause this problem.

fsysinfo will show Fsys limits - osinfo will also show Proc’s
fd limits.
How does he know that it’s file descriptors that are used up?
If he is getting an EMFILE error and the disk is big then
it’s an Fsys bug which can be fixed by a magic John Garvey
incantation: His suggestion for a 28G drive was to add “-H86016”
to Fsys’s command line.
Prior to 4.23, there was a “Heapf” in Proc that was limited in size.

He’s getting ENFILE (errno 23: file table overflow)
Wouldn’t this be a Proc issue?
Now I’m not sure - if you have access to the old quics/experts/fsys,

look for message 6337, 5 Oct 99. One post from Steve talks about
EMFILE/ENFILE confusion, which I don’t understand… Here are some
snippets:

Xref: quics quics.experts.fsys:6343
From: steve@qnx.com (Steve McPolin)
Newsgroups: quics.experts.fsys
Subject: Re: EMFILE on a lightly loaded system
Date: 6 Oct 1999 13:41:25 GMT
[snip]
In article <FJ5D8p.35J@qnx.com>, John Garvey <jgarvey@qnx.com> wrote:

Jay Hogg (> jshogg@qnx.com> ) wrote:
With approx:
59 processes running
25 proxies
5 vc’s
300 fd’s (mostly iomanagers, dev, shmem)
50 physical disk files
I start getting “[some program]: Too many open files” and it appears
I’m bouncing against a wall. All options are default. “osinfo” shows
I’m no where near the limits.
Where do I look?

Do you have any idea what file “[some program]” is trying to open? Is it
on a local disk? Is it over the network (in which case the fd config of
the remote machine comes into play)? Is it a shell (in which case the
FD_CLO_EXEC becomes important)?

Fsys itself never directly returns EMFILE; it uses ENFILE for its
internal overflows (and qnx_fd_attach() should not fail when used from
here, as the client side did the initial creation half of the call).

Proc on the other hand only returns EMFILE and never ENFILE … I may
take you up on the crossposting angle > :slight_smile:

Did anything interesting appear in traceinfo?
Are you starting programs on other nodes?
What state is the fd set of the other node?

Proc (intentionally) produces EMFILE:

  1. If you specify an fd message with too large an fd
    [ qnx_fdquery/attach/detach, fcntl, dup, open()*, … ].
  2. Your per-process file table is full (default) [512].
  3. It gets an EMFILE (1,2) attempting to dup the fds during
    process creation.

The latter one is where the network case can be interesting; a host
configured for, say, 512 fd’s may not be able to create a process
on one configured for, say, 256 fd’s because the process may
require an fd beyond the per-process limit.

Note also, that for Proc’s consideration, fd’s are considered unsigned
so attaching to -1 will cause EMFILE rather than EBADF.


Steve McPolin (steve@qnx.com); QNX Software Systems, Ltd.


Xref: quics quics.experts.fsys:6348
Newsgroups: quics.experts.fsys
Path: quics!jgarvey
From: jgarvey@qnx.com (John Garvey)
Subject: Re: EMFILE on a lightly loaded system
Organization: QNX Software Systems
[snip]

Jay Hogg (jshogg@qnx.com) wrote:

Thank you John!

Sorry for not catching it sooner, the EMFILE/ENFILE fooled me (and
still has), but I should have looked more thoroughly at your logs.
I have been pretty swamped recently though :frowning:

How about a hint to what it should be since ‘-H’ isn’t documented
and I don’t know what is in it to calculate it?
(ps, fsysinfo doesn’t show it > :slight_smile:

That is why I put the internal guess into the trace message; from
yours it is 63k ------------------v
(e l i f )

Oct 04 20:22:38 2 00003024 0000F7B1 0004E781 656C6966

and grows to 320k --------------------------^
once known space for files/inodes/names/etc has been added. Problem
is the low initial guess steals off the known values for bitmap
tables, and you run out of files/inodes/etc instead.

So, for 28Gig of writable disk, I think you’ll need an extra 20k, so
try “-H86016” (it doesn’t perform nice parsing).
[snip]

Richard

Is it at all possible that files could get permanently left open due to
very abnormal process termination. (I.e. it didn’t get a chance to
execute its termination thread properly.)

Is there any other diagnostic tool to inspect the file table usage (or
any other relevant resource) besides “sin”?

ARGA (Any Response Greatly Appreciated)

rick

Thanks for the info; lots of things to consider… Although I’m
surprised that Proc would never return ENFILE, since it holds the system
wide table.

BTW, you mentioned “osinfo”. I can’t find it on my QNX4 system. I take
it it’s a free util on QUICS?

“Richard R. Kramer” wrote:

Rick Lake wrote:
[snip]
“Richard R. Kramer” wrote:

Mario Charest wrote:


Not that I’m aware of. However make sure you have the latest 4.25D.
I don’t remember when (early 4.25?) but there was a bug in QNX that
would cause this problem.

fsysinfo will show Fsys limits - osinfo will also show Proc’s
fd limits.
How does he know that it’s file descriptors that are used up?
If he is getting an EMFILE error and the disk is big then
it’s an Fsys bug which can be fixed by a magic John Garvey
incantation: His suggestion for a 28G drive was to add “-H86016”
to Fsys’s command line.
Prior to 4.23, there was a “Heapf” in Proc that was limited in size.

He’s getting ENFILE (errno 23: file table overflow)
Wouldn’t this be a Proc issue?
Now I’m not sure - if you have access to the old quics/experts/fsys,
look for message 6337, 5 Oct 99. One post from Steve talks about
EMFILE/ENFILE confusion, which I don’t understand… Here are some
snippets:

Xref: quics quics.experts.fsys:6343
From: > steve@qnx.com > (Steve McPolin)
Newsgroups: quics.experts.fsys
Subject: Re: EMFILE on a lightly loaded system
Date: 6 Oct 1999 13:41:25 GMT
[snip]
In article <> FJ5D8p.35J@qnx.com> >, John Garvey <> jgarvey@qnx.com> > wrote:
Jay Hogg (> jshogg@qnx.com> ) wrote:
With approx:
59 processes running
25 proxies
5 vc’s
300 fd’s (mostly iomanagers, dev, shmem)
50 physical disk files
I start getting “[some program]: Too many open files” and it appears
I’m bouncing against a wall. All options are default. “osinfo” shows
I’m no where near the limits.
Where do I look?

Do you have any idea what file “[some program]” is trying to open? Is it
on a local disk? Is it over the network (in which case the fd config of
the remote machine comes into play)? Is it a shell (in which case the
FD_CLO_EXEC becomes important)?

Fsys itself never directly returns EMFILE; it uses ENFILE for its
internal overflows (and qnx_fd_attach() should not fail when used from
here, as the client side did the initial creation half of the call).

Proc on the other hand only returns EMFILE and never ENFILE … I may
take you up on the crossposting angle > :slight_smile:

Did anything interesting appear in traceinfo?
Are you starting programs on other nodes?
What state is the fd set of the other node?

Proc (intentionally) produces EMFILE:

  1. If you specify an fd message with too large an fd
    [ qnx_fdquery/attach/detach, fcntl, dup, open()*, … ].
  2. Your per-process file table is full (default) [512].
  3. It gets an EMFILE (1,2) attempting to dup the fds during
    process creation.

The latter one is where the network case can be interesting; a host
configured for, say, 512 fd’s may not be able to create a process
on one configured for, say, 256 fd’s because the process may
require an fd beyond the per-process limit.

Note also, that for Proc’s consideration, fd’s are considered unsigned
so attaching to -1 will cause EMFILE rather than EBADF.


Steve McPolin (> steve@qnx.com> ); QNX Software Systems, Ltd.

Xref: quics quics.experts.fsys:6348
Newsgroups: quics.experts.fsys
Path: quics!jgarvey
From: > jgarvey@qnx.com > (John Garvey)
Subject: Re: EMFILE on a lightly loaded system
Organization: QNX Software Systems
[snip]

Jay Hogg (> jshogg@qnx.com> ) wrote:
Thank you John!

Sorry for not catching it sooner, the EMFILE/ENFILE fooled me (and
still has), but I should have looked more thoroughly at your logs.
I have been pretty swamped recently though > :frowning:

How about a hint to what it should be since ‘-H’ isn’t documented
and I don’t know what is in it to calculate it?
(ps, fsysinfo doesn’t show it > :slight_smile:

That is why I put the internal guess into the trace message; from
yours it is 63k ------------------v
(e l i f )
Oct 04 20:22:38 2 00003024 0000F7B1 0004E781 656C6966

and grows to 320k --------------------------^
once known space for files/inodes/names/etc has been added. Problem
is the low initial guess steals off the known values for bitmap
tables, and you run out of files/inodes/etc instead.

So, for 28Gig of writable disk, I think you’ll need an extra 20k, so
try “-H86016” (it doesn’t perform nice parsing).
[snip]

Richard



Is it at all possible that files could get permanently left open due to
very abnormal process termination. (I.e. it didn’t get a chance to
execute its termination thread properly.)

Is there any other diagnostic tool to inspect the file table usage (or
any other relevant resource) besides “sin”?

ARGA (Any Response Greatly Appreciated)

rick

Rick Lake <rwlake@spam.redirected.to.dev.null> wrote:

Thanks for the info; lots of things to consider… Although I’m
surprised that Proc would never return ENFILE, since it holds the system
wide table.

BTW, you mentioned “osinfo”. I can’t find it on my QNX4 system. I take
it it’s a free util on QUICS?

Yes, but it would be findable as os_info, not osinfo. Also look for
sysres and sysmon.

-David