On Sat, Jan 20, 2018 at 10:16 AM, Peng Yu <pengyu...@gmail.com> wrote:
> Hi, > > There are ~7000 .txt files in a directory on glusterfs. Here are the run > time of the following two commands. Does anybody know why the find command > is much slower than *.txt? Is there a way to change the API that `find` > uses to search files so that it can be more friendly to > glusterfs? > > $ time echo *.txt > /dev/null > > real 0m2.206s > user 0m0.039s > sys 0m0.056s > $ time find -name '*.txt' > /dev/null > > real 0m18.558s > user 0m0.317s > sys 0m0.663s Is this an apples-to-apples comparison? For example does . contain sub directories? A comparison of the output of strace -c for both commands will probably be illuminating. Perhaps stat calls are relatively expensive on glusterfs (this happens on at least some other cluster filesystems because obtaining a correct value fort st_size requires finding the consensus answer for the current length of the file, while obtaining the list of items in a directory may not require the same amount of locking or consensus work James.