Re: [PATCH] find: doc: Fix -prune SCM example directories and make it more efficient

raf Wed, 23 Nov 2022 07:01:18 -0800

On Wed, Nov 23, 2022 at 07:07:14PM +1100, raf <r...@raf.org> wrote:

> On Tue, Nov 22, 2022 at 11:12:00PM +0100, Bernhard Voelker 
> <m...@bernhard-voelker.de> wrote:
> 
> > Hi raf,
> 
> Hi Berny,
> 
> > thanks for the patch.
> > Besides busy day job times here, I was actually waiting for the FSF legal
> > to add your entry into the 'copyright.list' file on the GNU fencepost 
> > server.
> > It's not yet in place though, so no need to hurry.
> 
> No worries. That'll take some time. I've emailed the
> copyright assignment request form to Craig Topham, but
> the paperwork has yet to arrive here for signing.
> 
> > On 11/18/22 23:48, raf wrote:
> > > * find/find.1 - Fix -prune SCM example directories and make it more 
> > > efficient
> > > * doc/find.texi - Make -prune SCM example more efficient (two ways)
> > > * NEWS - Mention the above
> > > ---
> > >   NEWS          |  3 +++
> > >   doc/find.texi | 25 +++++++++++++++++++++----
> > >   find/find.1   | 13 ++++++++++++-
> > >   3 files changed, 36 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/NEWS b/NEWS
> > > index 1fff34f8..7c6fcfb3 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -18,6 +18,9 @@ GNU findutils NEWS - User visible changes.      -*- 
> > > outline -*- (allout)
> > >     When generating the Texinfo manual, `makeinfo` is invoked with the 
> > > --no-split
> > >     option for all output formats now; this avoids files like 
> > > find.info-[12].
> > > +  The find.1 manual's -prune SCM example directories/output have been 
> > > fixed, and
> > > +  the example itself has been made more efficient (find.1 and find.texi) 
> > > [#62259]
> > > +  Also fixed a typo in find.texi.
> > >   * Noteworthy changes in release 4.9.0 (2022-02-22) [stable]
> > > diff --git a/doc/find.texi b/doc/find.texi
> > > index 379fe646..1a36e243 100644
> > > --- a/doc/find.texi
> > > +++ b/doc/find.texi
> > > @@ -5138,13 +5138,15 @@ already found.
> > >   @smallexample
> > >   find repo/ \
> > > --exec test -d @{@}/.svn \; -or \
> > > --exec test -d @{@}/.git \; -or \
> > > --exec test -d @{@}/CVS \; -print -prune
> > > +-type d \
> > > +\( -exec test -d @{@}/.svn \; \
> > > +-or -exec test -d @{@}/.git \; \
> > > +-or -exec test -d @{@}/CVS \; \
> > > +\) -print -prune
> > >   @end smallexample
> > 
> > Indeed, the bug omitting the \( ... \) has been introduced in 2008:
> > 
> >   $ git diff v4.5.6b-28-gaca33f85^..v4.5.6b-28-gaca33f85 -- doc/find.texi | 
> > tail -n12
> >   @@ -4683,7 +4683,10 @@ searching subdirectories inside projects whose SCM 
> > directory we
> >    already found.
> > 
> >    @smallexample
> >   -find repo/ -exec test -d @{@}/.svn -o -d @{@}/.git -o -d @{@}/CVS \; 
> > -print -prune
> >   +find repo/ \
> >   +-exec test -d @{@}/.svn \; -or \
> >   +-exec test -d @{@}/.git \; -or \
> >   +-exec test -d @{@}/CVS \; -print -prune
> >    @end smallexample
> > 
> >    In this example, @command{test} is used to tell if we are currently
> > 
> > For find.1, this was already fixed in commit v4.6.0-55-g47d8fd38, but missed
> > to check the analog place in the texi file.
> > 
> > Your patch above aims at optimizing things by letting 'find -type d' 
> > pre-filter
> > directories in order to avoid -exec invocations on regular and other 
> > non-directory
> > files.
> > The '-type d' test is different from 'test -d ...' because the latter 
> > transparently
> > follows symbolic links while the former strictly checks on the type of the 
> > entry.
> > Therefore, the patch changes the result in the case the repo workspace is a 
> > symlink:
> > the new version would skip it.  Well, adding -L would help.
> 
> Ah, I hadn't thought of that. I can add -L.
> 
> > >   In this example, @command{test} is used to tell if we are currently
> > > -examining a directory which appears to the a project's root directory
> > > +examining a directory which appears to be a project's root directory
> > 
> > good catch!
> > 
> > >   (because it has an SCM subdirectory).  When we find a project root,
> > >   there is no need to search inside it, and @code{-prune} makes sure
> > >   that we descend no further.
> > > @@ -5153,6 +5155,21 @@ For large, complex trees like the Linux kernel, 
> > > this will prevent
> > >   searching a large portion of the structure, saving a good deal of
> > >   time.
> > > +The @samp{-type d} clause causes the three @samp{test} shell
> > > +processes to only be executed for directories. This can be made even
> > > +more efficient by combining the three @samp{test} shell processes
> > > +into a single process:
> > > +
> > > +@smallexample
> > > +find repo/ \
> > > +-type d \
> > > +-exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d "$1"/CVS' 
> > > . {} \; \
> > > +-print -prune
> > > +@end smallexample
> > > +
> > > +Note that the @samp{.} argument is just a placeholder for the unused
> > > +@samp{$0} environment variable in the @samp{sh -c} command. The
> > > +@samp{@{@}} argument is the @samp{$1} environment variable.
> > _______________________________^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > s/environment/shell/, and maybe s/variable/parameter/
> 
> Are you sure? The term "environment variable" is a
> fairly standard term. The find documentation (manual
> and info) already uses it at least 16 times. There are
> no appearances there of the term "shell parameter".
> 
> Should I change it anyway?


Ah, I get it. I think "shell positional parameter" would be
the most clear term.

> > >   @node Security Considerations
> > >   @chapter Security Considerations
> > 
> > I'm still unsure about efficiency here.  Summary:
> > 
> > 1) Originally it was:
> > 
> >   $ find repo/ -exec test -d '{}/.svn' -o -d '{}/.git' -o -d '{}/CVS' \; 
> > -print -prune
> > 
> > -> 1 process, doing the OR-ing itself.  Nice and neat, and efficient, but 
> > maybe a bit
> > confusing for the novice reader: it may not be obvious that test(1) is 
> > doing the OR-ing,
> > because -o is also a find(1) operator.
> > Besides readability, the test -o operator is discouraged.
> > 
> >   $ env '[' --help
> >   ...
> >   NOTE: Binary -a and -o are inherently ambiguous.  Use 'test EXPR1 && test
> >   EXPR2' or 'test EXPR1 || test EXPR2' instead.
> >   ...
> > 
> > 2) Now there's the 3x -exec, with fixed -or grouping (with or without 
> > '-type d'):
> > 
> >   $ find repo/ -type d \
> >       '(' -exec test -d '{}/.svn' \; -or \
> >           -exec test -d '{}/.git' \; -or \
> >           -exec test -d '{}/CVS' \; \
> >       ')' -print -prune
> > 
> > -> less efficient than 1) because it launches 3x test(1) via -exec.
> > OTOH good to read.
> > 
> > 3. Then there's 1x -exec to launch a shell process to do the OR-ing:
> > 
> >   $ find repo/ \
> >       -type d \
> >       -exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d 
> > "$1"/CVS' . {} \; \
> >       -print -prune
> > 
> > While there's only 1x -exec per directory, I'm wondering if launching one
> > shell process with 3x test (which is usually the shell builtin) is really
> > more efficient than 3x launching the executable found by `which test`.
> > 
> > We don't have a measurement, and the results may vary on different systems,
> > but I could imagine the launching a shell is more expensive than launching
> > 3x /usr/bin/test.
> >
> > Anyhow, I like the discussion, and I believe it's good to have several 
> > alternatives
> > documented with their individual pros vs. cons considerations.
> > WDYT?
> 
> I just did a quick check (Debian vm on a mature macbookpro)
> and 1000 * 3 * /usr/bin/test is 3.8s and 1000 * /bin/sh is
> 1.3s (~3x faster). I'm not surprised. dash is only 2x the
> size of test (both small), and they load the same shared
> libraries.
> 
> The original version of this patch just had the single
> -exec version (my preference), but it was thought better to
> leave the three -exec version in place, and add the single
> -exec version as an alternative.
> 
> I think either is OK. The more important difference is
> adding -test d and not executing any sh/test processes
> for every non-directory.
> 
> > > diff --git a/find/find.1 b/find/find.1
> > > index 429aa2f0..4faaa23b 100644
> > > --- a/find/find.1
> > > +++ b/find/find.1
> > > @@ -2494,6 +2494,7 @@ projects' roots:
> > >   .in +4m
> > >   .B $ find repo/ \e
> > >   .in +4m
> > > +.B \-type d \e
> > >   .B \e( \-exec test \-d \(aq{}/.svn\(aq \e; \e
> > >   .B \-or \-exec test \-d \(aq{}/.git\(aq \e; \e
> > >   .B \-or \-exec test \-d \(aq{}/CVS\(aq \e; \e
> > > @@ -2502,7 +2503,7 @@ projects' roots:
> > >   .in -4m
> > >   \&
> > >   .fi
> > > -Sample output:
> > > +Sample directories:
> > >   .nf
> > >   \&
> > >   .in +4m
> > > @@ -2513,6 +2514,16 @@ Sample output:
> > >   .B repo/project4/.git
> > >   .in
> > >   \&
> > > +Sample output:
> > > +.nf
> > > +\&
> > > +.in +4m
> > > +.B repo/project1
> > > +.B repo/gnu/project2
> > > +.B repo/gnu/project3
> > > +.B repo/project4
> > > +.in
> > > +\&
> > >   .fi
> > >   In this example,
> > >   .B \-prune
> > 
> > +1 on this.
> > 
> > Have a nice day,
> > Berny
> 
> Thanks. You too.
> 
> cheers,
> raf
>

Re: [PATCH] find: doc: Fix -prune SCM example directories and make it more efficient

Reply via email to