File System Descrepency

I know you’ve all been worried sick about the problems I was having a couple
of weeks ago… so I though I’d provide you with some closure :wink:

I finally got the node I was having problems with replaced last weekend.
The new (replacement) node’s ‘df’ reports ~1.3 Gig of used disk space on the
/u drive. Which is exactly what I expected it to be.

Before I did the rollover, I did make some modest gains with the database
engine update. The one that made use of the ltrunc “feature”. But, only
got around 10-12 M back.

However, in the process of doing the database update, I discovered we had
some “file leaks” on our system i.e. processes with open file descriptors to
unlinked files. The ‘sin files’ utility is quite handy for tracking things
like this down. One of the problems was a silly bug in the database
“unload” utility function. Another equally silly bug was in a process that
didn’t close it’s files before exec’ing itself, so it inherited 3 more “file
leaks” every time it did so. Other problems were basically attributable to
users with bad habits. Like leaving ‘tail -f’ running over night on log
files that got rolled out from underneath them.

I also cleaned out some directories infected by pack rats :wink: A never
ending problem it seems.

All in all I managed to get back about 400M.

Before doing the rollover, I ran a ‘chkfsys’. There was a minor bitmap
discrepancy (Mitchell you’ve been vindicated :wink: but, fixing it didn’t gain
back any significant disk space.

It wasn’t until I was doing the pax copy to the new node that I discovered
the real problem… We have this mail delivery daemon that relays E-mail to
our central server. It keeps daily log files, that are (suppose to be)
removed after 5 days or so. There was a bug in the routine that opens and
writes to the log file. The path name is always exactly 32 characters long.
The stack var for holding the path name was declared as ‘char fname[32]’…
The next var on the stack was a u_long “message counter”. It seems that
every time something was to be written to the log, a new file name was
created of the form “…logXXXX”, where “XXXX” was exercising the entire
256 character ASCII set. When I saw the Greek Alphabet and Box Drawing
characters scrolling by I stopped the pax copy and eliminated the “Deliver”
directory from the copy list.

This morning I did a post mortem on the old node… ‘du -k’ reported ~2M for
the Delivery dir (which really isn’t that out of line for a logging
directory… which is what made this so hard to find) BUT, Do you believe
this… 25,940 “Delivery” log files! The directory “file” all by itself was
1.9M. I did an ‘rm -fR’ on the Deliver log directory… result: ‘df’
reports 1.2G used. The 4G drive went from 69% used to 29% used. Mario,
you’re right… a lot of little files can really eat up a disk :wink:

Thanks to everyone for helping… I’m all better now, thank you :sunglasses: