On Wed, Nov 23, 2022 at 07:07:14PM +1100, raf <r...@raf.org> wrote: > On Tue, Nov 22, 2022 at 11:12:00PM +0100, Bernhard Voelker > <m...@bernhard-voelker.de> wrote: > > > Hi raf, > > Hi Berny, > > > thanks for the patch. > > Besides busy day job times here, I was actually waiting for the FSF legal > > to add your entry into the 'copyright.list' file on the GNU fencepost > > server. > > It's not yet in place though, so no need to hurry. > > No worries. That'll take some time. I've emailed the > copyright assignment request form to Craig Topham, but > the paperwork has yet to arrive here for signing. > > > On 11/18/22 23:48, raf wrote: > > > * find/find.1 - Fix -prune SCM example directories and make it more > > > efficient > > > * doc/find.texi - Make -prune SCM example more efficient (two ways) > > > * NEWS - Mention the above > > > --- > > > NEWS | 3 +++ > > > doc/find.texi | 25 +++++++++++++++++++++---- > > > find/find.1 | 13 ++++++++++++- > > > 3 files changed, 36 insertions(+), 5 deletions(-) > > > > > > diff --git a/NEWS b/NEWS > > > index 1fff34f8..7c6fcfb3 100644 > > > --- a/NEWS > > > +++ b/NEWS > > > @@ -18,6 +18,9 @@ GNU findutils NEWS - User visible changes. -*- > > > outline -*- (allout) > > > When generating the Texinfo manual, `makeinfo` is invoked with the > > > --no-split > > > option for all output formats now; this avoids files like > > > find.info-[12]. > > > + The find.1 manual's -prune SCM example directories/output have been > > > fixed, and > > > + the example itself has been made more efficient (find.1 and find.texi) > > > [#62259] > > > + Also fixed a typo in find.texi. > > > * Noteworthy changes in release 4.9.0 (2022-02-22) [stable] > > > diff --git a/doc/find.texi b/doc/find.texi > > > index 379fe646..1a36e243 100644 > > > --- a/doc/find.texi > > > +++ b/doc/find.texi > > > @@ -5138,13 +5138,15 @@ already found. > > > @smallexample > > > find repo/ \ > > > --exec test -d @{@}/.svn \; -or \ > > > --exec test -d @{@}/.git \; -or \ > > > --exec test -d @{@}/CVS \; -print -prune > > > +-type d \ > > > +\( -exec test -d @{@}/.svn \; \ > > > +-or -exec test -d @{@}/.git \; \ > > > +-or -exec test -d @{@}/CVS \; \ > > > +\) -print -prune > > > @end smallexample > > > > Indeed, the bug omitting the \( ... \) has been introduced in 2008: > > > > $ git diff v4.5.6b-28-gaca33f85^..v4.5.6b-28-gaca33f85 -- doc/find.texi | > > tail -n12 > > @@ -4683,7 +4683,10 @@ searching subdirectories inside projects whose SCM > > directory we > > already found. > > > > @smallexample > > -find repo/ -exec test -d @{@}/.svn -o -d @{@}/.git -o -d @{@}/CVS \; > > -print -prune > > +find repo/ \ > > +-exec test -d @{@}/.svn \; -or \ > > +-exec test -d @{@}/.git \; -or \ > > +-exec test -d @{@}/CVS \; -print -prune > > @end smallexample > > > > In this example, @command{test} is used to tell if we are currently > > > > For find.1, this was already fixed in commit v4.6.0-55-g47d8fd38, but missed > > to check the analog place in the texi file. > > > > Your patch above aims at optimizing things by letting 'find -type d' > > pre-filter > > directories in order to avoid -exec invocations on regular and other > > non-directory > > files. > > The '-type d' test is different from 'test -d ...' because the latter > > transparently > > follows symbolic links while the former strictly checks on the type of the > > entry. > > Therefore, the patch changes the result in the case the repo workspace is a > > symlink: > > the new version would skip it. Well, adding -L would help. > > Ah, I hadn't thought of that. I can add -L. > > > > In this example, @command{test} is used to tell if we are currently > > > -examining a directory which appears to the a project's root directory > > > +examining a directory which appears to be a project's root directory > > > > good catch! > > > > > (because it has an SCM subdirectory). When we find a project root, > > > there is no need to search inside it, and @code{-prune} makes sure > > > that we descend no further. > > > @@ -5153,6 +5155,21 @@ For large, complex trees like the Linux kernel, > > > this will prevent > > > searching a large portion of the structure, saving a good deal of > > > time. > > > +The @samp{-type d} clause causes the three @samp{test} shell > > > +processes to only be executed for directories. This can be made even > > > +more efficient by combining the three @samp{test} shell processes > > > +into a single process: > > > + > > > +@smallexample > > > +find repo/ \ > > > +-type d \ > > > +-exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d "$1"/CVS' > > > . {} \; \ > > > +-print -prune > > > +@end smallexample > > > + > > > +Note that the @samp{.} argument is just a placeholder for the unused > > > +@samp{$0} environment variable in the @samp{sh -c} command. The > > > +@samp{@{@}} argument is the @samp{$1} environment variable. > > _______________________________^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > s/environment/shell/, and maybe s/variable/parameter/ > > Are you sure? The term "environment variable" is a > fairly standard term. The find documentation (manual > and info) already uses it at least 16 times. There are > no appearances there of the term "shell parameter". > > Should I change it anyway?
Ah, I get it. I think "shell positional parameter" would be the most clear term. > > > @node Security Considerations > > > @chapter Security Considerations > > > > I'm still unsure about efficiency here. Summary: > > > > 1) Originally it was: > > > > $ find repo/ -exec test -d '{}/.svn' -o -d '{}/.git' -o -d '{}/CVS' \; > > -print -prune > > > > -> 1 process, doing the OR-ing itself. Nice and neat, and efficient, but > > maybe a bit > > confusing for the novice reader: it may not be obvious that test(1) is > > doing the OR-ing, > > because -o is also a find(1) operator. > > Besides readability, the test -o operator is discouraged. > > > > $ env '[' --help > > ... > > NOTE: Binary -a and -o are inherently ambiguous. Use 'test EXPR1 && test > > EXPR2' or 'test EXPR1 || test EXPR2' instead. > > ... > > > > 2) Now there's the 3x -exec, with fixed -or grouping (with or without > > '-type d'): > > > > $ find repo/ -type d \ > > '(' -exec test -d '{}/.svn' \; -or \ > > -exec test -d '{}/.git' \; -or \ > > -exec test -d '{}/CVS' \; \ > > ')' -print -prune > > > > -> less efficient than 1) because it launches 3x test(1) via -exec. > > OTOH good to read. > > > > 3. Then there's 1x -exec to launch a shell process to do the OR-ing: > > > > $ find repo/ \ > > -type d \ > > -exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d > > "$1"/CVS' . {} \; \ > > -print -prune > > > > While there's only 1x -exec per directory, I'm wondering if launching one > > shell process with 3x test (which is usually the shell builtin) is really > > more efficient than 3x launching the executable found by `which test`. > > > > We don't have a measurement, and the results may vary on different systems, > > but I could imagine the launching a shell is more expensive than launching > > 3x /usr/bin/test. > > > > Anyhow, I like the discussion, and I believe it's good to have several > > alternatives > > documented with their individual pros vs. cons considerations. > > WDYT? > > I just did a quick check (Debian vm on a mature macbookpro) > and 1000 * 3 * /usr/bin/test is 3.8s and 1000 * /bin/sh is > 1.3s (~3x faster). I'm not surprised. dash is only 2x the > size of test (both small), and they load the same shared > libraries. > > The original version of this patch just had the single > -exec version (my preference), but it was thought better to > leave the three -exec version in place, and add the single > -exec version as an alternative. > > I think either is OK. The more important difference is > adding -test d and not executing any sh/test processes > for every non-directory. > > > > diff --git a/find/find.1 b/find/find.1 > > > index 429aa2f0..4faaa23b 100644 > > > --- a/find/find.1 > > > +++ b/find/find.1 > > > @@ -2494,6 +2494,7 @@ projects' roots: > > > .in +4m > > > .B $ find repo/ \e > > > .in +4m > > > +.B \-type d \e > > > .B \e( \-exec test \-d \(aq{}/.svn\(aq \e; \e > > > .B \-or \-exec test \-d \(aq{}/.git\(aq \e; \e > > > .B \-or \-exec test \-d \(aq{}/CVS\(aq \e; \e > > > @@ -2502,7 +2503,7 @@ projects' roots: > > > .in -4m > > > \& > > > .fi > > > -Sample output: > > > +Sample directories: > > > .nf > > > \& > > > .in +4m > > > @@ -2513,6 +2514,16 @@ Sample output: > > > .B repo/project4/.git > > > .in > > > \& > > > +Sample output: > > > +.nf > > > +\& > > > +.in +4m > > > +.B repo/project1 > > > +.B repo/gnu/project2 > > > +.B repo/gnu/project3 > > > +.B repo/project4 > > > +.in > > > +\& > > > .fi > > > In this example, > > > .B \-prune > > > > +1 on this. > > > > Have a nice day, > > Berny > > Thanks. You too. > > cheers, > raf >