On Thu, Apr 16, 2015 at 1:26 AM, Bernhard Voelker <m...@bernhard-voelker.de> wrote: > On 04/16/2015 06:04 AM, Peng Yu wrote: >> Hi, The following code shows that -prune when used with -exec can be >> very slow. Is there somehow a way to speed this up? >> >> ~$ cat main.sh >> #!/usr/bin/env bash >> >> tmpdir=$(mktemp -d) >> >> function mkalotdir { >> local n=$1 >> local i >> local j >> local k >> for i in $(seq -w "$n") >> do >> for j in $(seq -w "$n") >> do >> for k in $(seq -w "$n") >> do >> echo "$tmpdir/$i/$j/$k" >> done >> done >> done | xargs mkdir -p >> } >> >> function myfind { >> find "$tmpdir" > /dev/null >> } >> >> function myfindprune { >> find "$tmpdir" -exec $(type -P test) -e {}/.findignore ';' -prune -o >> -print > /dev/null >> } >> >> mkalotdir 10 >> echo myfind >> time myfind >> echo myfindprune >> time myfindprune >> >> ~$ ./main.sh >> myfind >> >> real 0m0.018s >> user 0m0.005s >> sys 0m0.011s >> myfindprune >> >> real 0m5.354s >> user 0m1.145s >> sys 0m1.539s > > Well, half a second for 1111 times creating and running /usr/bin/test > doesn't seem too much. At least, I can second your timing results. > > The time is not lost in find but with executing the test(1) program > for so many times. To get an idea, start the above command line with > "strace -f -v find ...". > > You'll see that you are "comparing apples with pears" - my home country > doesn't grow oranges, so that's what this saying looks like over here. > ;-) > > To get a little better result, you could avoid the overhead in test(1) > regarding NLS etc. by rolling your own, puristic(!) test program: > > $ cat /tmp/mystat.c > #include <sys/stat.h> > int main (int argc, char**argv) { > struct stat sb; > return -1 == stat (argv[1], &sb); > } > > $ gcc -Wall -O3 -o /tmp/mystat /tmp/mystat.c > $ strip /tmp/mystat > > $ time find . -type d -exec /tmp/mystat '{}'/.findignore \; -prune -o > -print >/dev/null > > real 0m0.340s > user 0m0.014s > sys 0m0.064s
On my machine, the speed up using C stat() is only about 1x. myfind real 0m0.020s user 0m0.005s sys 0m0.012s myfindprune real 0m5.408s user 0m1.137s sys 0m1.565s myfindcppprune # using C stat() real 0m2.455s user 0m0.699s sys 0m1.088s > Given what the system has to process compared to a bare "find .", > this is IMO quite good, isn't it? The real question is whether using `-prune` as it is in `find` is a good idea. It sounds like in cases like the one that I showed, it is better just to do two `find`s: one searches for all directories without any restriction, the other searches for .findignore to get all the directories to be ignored. Then a program (to be written, which can be included in findutils) can be used to prune the directory in one shot based on the results on both runs of find. But the fact that there is `-prune` in there will promote people use it which results in low performance search. -- Regards, Peng