"df" hangs, not killable

John_Nagle1 · July 5, 2005, 4:54pm

Strange new bug: “df” hangs.

It prints the stats for the file systems, but won’t exit.
It’s not killable, even with “slay -9”. Some resource
manager must be holding it hostage, refusing to release it.

We’re just running QNX 6.21, with stock resource
managers. We have a local file system, the package
file system, one remote NFS mount, and a repository
mounted. All list properly in “df”.

John Nagle

Colin_Burgess1 · July 5, 2005, 5:08pm

pidin -pdf will tell you who the culprit is.

John Nagle wrote:

Strange new bug: “df” hangs.

It prints the stats for the file systems, but won’t exit.
It’s not killable, even with “slay -9”. Some resource
manager must be holding it hostage, refusing to release it.

We’re just running QNX 6.21, with stock resource
managers. We have a local file system, the package
file system, one remote NFS mount, and a repository
mounted. All list properly in “df”.

John Nagle

–
cburgess@qnx.com

John_Nagle1 · July 5, 2005, 5:47pm

Colin Burgess wrote:

pidin -pdf will tell you who the culprit is.

John Nagle wrote:

Strange new bug: “df” hangs.

It prints the stats for the file systems, but won’t exit.
It’s not killable, even with “slay -9”. Some resource
manager must be holding it hostage, refusing to release it.

We’re just running QNX 6.21, with stock resource
managers. We have a local file system, the package
file system, one remote NFS mount, and a repository
mounted. All list properly in “df”.

John Nagle

pidin -pdf
35684383 1 bin/df 10r REPLY 6

36032557 1 bin/df 10o REPLY 6
36147248 1 bin/df 10o REPLY 6
35541041 1 bin/df 10r REPLY 6
35553330 1 bin/df 10r REPLY 6
35590195 1 bin/df 10r REPLY 6

ps -A
PID TTY TIME CMD

1 ? 11:18:11
2 ? 00:00:00 /sbin/tinit
3 ? 00:00:00 slogger
12292 ? 00:00:00 mqueue
5 ? 00:00:00 pci-bios
6 ? 00:13:48 devb-eide
7 ? 00:00:03 devc-con
8 ? 00:00:55 fs-pkg

So devb-eide is holding up some message from “df”?
The disk driver?

John Nagle

Colin_Burgess1 · July 5, 2005, 6:05pm

looks like it. :v(

John Nagle wrote:

Colin Burgess wrote:

pidin -pdf will tell you who the culprit is.

John Nagle wrote:

Strange new bug: “df” hangs.

It prints the stats for the file systems, but won’t exit.
It’s not killable, even with “slay -9”. Some resource
manager must be holding it hostage, refusing to release it.

We’re just running QNX 6.21, with stock resource
managers. We have a local file system, the package
file system, one remote NFS mount, and a repository
mounted. All list properly in “df”.

John Nagle

pidin -pdf

35684383 1 bin/df 10r REPLY 6
36032557 1 bin/df 10o REPLY 6
36147248 1 bin/df 10o REPLY 6
35541041 1 bin/df 10r REPLY 6
35553330 1 bin/df 10r REPLY 6
35590195 1 bin/df 10r REPLY 6

ps -A
PID TTY TIME CMD
1 ? 11:18:11
2 ? 00:00:00 /sbin/tinit
3 ? 00:00:00 slogger
12292 ? 00:00:00 mqueue
5 ? 00:00:00 pci-bios
6 ? 00:13:48 devb-eide
7 ? 00:00:03 devc-con
8 ? 00:00:55 fs-pkg

So devb-eide is holding up some message from “df”?
The disk driver?

John Nagle

–
cburgess@qnx.com

John_Garvey1 · July 5, 2005, 6:11pm

John Nagle wrote:

35684383 1 bin/df 10r REPLY 6
6 ? 00:13:48 devb-eide
So devb-eide is holding up some message from “df”?
The disk driver?

What disk hardware do you have? “df” scans /proc/mount
looking for potential filesystems. Can you try something
like: find /proc/mount -name “0,6,*,0”
(ignore errors) and then feed those names one-at-a-time
into “df” and see if a particular one causes the hang
(and/or try any other mountpoints you may be suspicious of).

e.g, from my machine:
$ find /proc/mount -name “0,6,*,0”
/proc/mount/0,6,7,6,0
/proc/mount/dev/cd0/0,6,7,4,0
/proc/mount/dev/hd0t7/0,6,7,3,0
/proc/mount/dev/hd0t79/0,6,7,2,0
/proc/mount/dev/hd0/0,6,7,1,0
$ df /proc/mount/0,6,7,6,0
/dev/hd0t79 51745365 2327352 49418013 5% /
$ df /proc/mount/dev/cd0/0,6,7,4,0
/proc/mount/dev/cd0 0 0 0 100% (/fs/cd0/)

…etc…

John_Nagle1 · July 7, 2005, 1:06am

Two reboots later, it’s not happening any more.

No idea why.

The now-complete output from “df” now reads

df
/pkgs/repository/GN 0 0 0 100% /

/dev/hd0t79 20097252 12566718 7530534 63% /
/boot/fs/qnxbase.qf 98785 98001 784 100% /pkgs/base/
/dev/hd0t79 20097252 12566718 7530534 63% /
/dev/cd0 0 0 0 100% (/fs/cd0/)
/dev/hd0 160836480 160836480 0 100%

When “df” was hanging, it never printed the final /dev/hd0 line.

John Nagle

John_Nagle1 · July 7, 2005, 1:11am

John Nagle wrote:

Two reboots later, it’s not happening any more.

No idea why.

Does “df”, by default, explore “/net”? We have several
machines running QNET, some of them on an unreliable WiFi link.
(Mobile robotics, remember). “df” never seems to report
remote nodes by default, but does it go and look?

John Nagle
Team Overbot

Colin_Burgess1 · July 7, 2005, 1:42am

Sounds like it might have been hanging on the cd device then?

John Nagle wrote:

Two reboots later, it’s not happening any more.

No idea why.

The now-complete output from “df” now reads

df

/pkgs/repository/GN 0 0 0 100% /
/dev/hd0t79 20097252 12566718 7530534 63% /
/boot/fs/qnxbase.qf 98785 98001 784 100% /pkgs/base/
/dev/hd0t79 20097252 12566718 7530534 63% /
/dev/cd0 0 0 0 100% (/fs/cd0/)
/dev/hd0 160836480 160836480 0 100%

When “df” was hanging, it never printed the final /dev/hd0 line.

John Nagle

–
cburgess@qnx.com

Colin_Burgess1 · July 7, 2005, 1:48am

No, it only checks filesystems that are noted in /proc/mount

John Nagle wrote:

John Nagle wrote:

Two reboots later, it’s not happening any more.

No idea why.

Does “df”, by default, explore “/net”? We have several
machines running QNET, some of them on an unreliable WiFi link.
(Mobile robotics, remember). “df” never seems to report
remote nodes by default, but does it go and look?

John Nagle
Team Overbot

–
cburgess@qnx.com