This is a weird situation, and it may be resolved by the time I get feedback, but I’ll ask anyway, 'cuz I’m curious.
We have a commercial measurement application which saves some data off in .csv files. These files are supposed to be FTP’d off to another server and deleted hourly. Well, at one of our customers, this process went haywire and for some time the files didn’t all get deleted. By the time I got there, we had 420,000 files in a directory.
Well, I moved the directory aside, and set about cleaning these up. I have tried various approaches, a few of which are:
find -remove!
find | xargs rm -f (with various -P, -n options)
ls | xargs rm -f
and most recently a custom C program that performs readdir()s, doing a remove() on every entry.
The deletion went pretty quickly at first - about 50,000 files in the first afternoon. But it has gotten slower and slower. We are now taking something like 5-8 secs per file to delete these. The system is medium busy, but the main application has not been running while I’ve been doing this.
Why would this get so slow? Is it because the later files are more scattered on the disk? Under Windows, I might stop and defrag. If I look at a ps -ef listing, there are several instances of the eide driver which are racking up time, so I speculate that there’s a whole lot of searchin’ goin’ on.
Any thoughts or advice?
Thanks,
Randy C.