On Tue, Nov 05, 2013 at 07:15:19PM +0400, Reco wrote: > On Tue, Nov 05, 2013 at 02:29:10PM +0000, Jonathan Dowland wrote: > > On Tue, Nov 05, 2013 at 03:13:10PM +0400, Reco wrote: > > > find . -type f -name 'popularity-*' -print0 | xargs -0rn 20 rm -f > > > > I idly wonder (don't know) to what extend find might parallelize the > > unlinks with -delete. A cursory scan of the semantics would suggest it > > could potentially do so: it's not clear that a single unlink failing > > should stop future unlinks (merely spew errors and consider the -delete > > operation as a whole to have failed) > > xargs parallelism is optional. The point is that you have one process > which finds files, and another one (or another group of) who are > deleting files. Helps utilizing multiple cpus.
I know about xargs and parallelism. I was wondering whether find implemented parallelism internally, when it could, and afaics the semantics of -delete do not proclude it doing so. I did not investigate whether it does, but… > $ time find -type f -delete … > real 4m27.799s …suggests it doesn't. (I'm appalled by that!) > It's not the binary size which matters, it's the algorithm: The binary size effects the initial load-up time which, for small numbers of files/short execution times, may be the lions share of the total execution time. However as you point out, for orders of magnitute like 500,000; it's dwarfed by the algorithm. I'm quite amazed how much faster your perl implementation was. I can only imagine that nobody has ever been troubled by find's performance enough to work on it. This points to find not taking advantage of parallelism (and also to potential improvements in speed even for your perl implementation). > Basically, the difference is in the fact that find uses fstatat64 > syscall for each file, and this perl one-liner uses lstat64 and stat64 > syscalls. Use strace to check it in your environment. On another OS > results could be different. So you believe the discrepancy is entirely down to the difference between fstat64 and lstat/stat64? I find that hard to believe. I suspect find is just not very efficient. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131105165419.ga2...@bryant.redmars.org