Re: Performance issue of find function in Gluster File System

Pierre Gaston Wed, 16 Aug 2017 22:52:54 -0700

On Wed, Aug 16, 2017 at 11:02 PM, Zhao Li <lizhao.informat...@gmail.com>
wrote:


> Hi,
>
> I found there is a big difference of time performance between "ls" function
> and "find" function in Gluster File System
> <https://gluster.readthedocs.io/en/latest/Administrator%20Gu
> ide/GlusterFS%20Introduction/>.
> Here is the minimal working example.
>
> mkdir tmp
> touch tmp/{000..300}.txt
>
> time find ./ -path '*tmp*' -name '*.txt'> /dev/null
> real 0m42.629s
> user 0m0.675s
> sys 0m1.438s
>
> time ls tmp/*.txt > /dev/null
> real 0m0.042s
> user 0m0.003s
> sys 0m0.003s
>
> So I am wondering what C code you use for "ls" and "find" and how you
> explain "*" in "ls" and "find" to lead to this big difference in Gluster
> File System.
>
> Thanks a lot.
> Zhao
>

There are several differences, first note  that "ls" is not the one finding
the files. The shell is expanding *.txt then ls is passed all the files as
arguments.
*.txt is not recursive so only the files directly under /tmp will be search

In your find command, -path matches the whole path (/ included) and your
find command will descend in all the directories, whether they match tmp or
not, so depending on where you started to search from, it may search your
whole / partition.

A more comparable command would be:

find /tmp -name tmp -o -prune -name '*.txt' -print

or if your find command supports it:

find /tmp -maxdepth 1 -name '*.txt'

Note also that ls and find are separate tools that are not developed along
with bash.

For gnu find: https://www.gnu.org/software/findutils/
For gnu ls: https://www.gnu.org/software/coreutils/coreutils.html

But there are also other implementation for various systems.

Re: Performance issue of find function in Gluster File System

Reply via email to