Hi Russ,

At 2025-09-27T22:35:52-0700, Russ Allbery wrote:
> "G. Branden Robinson" <[email protected]> writes:
> > At this point I must ask that you direct me to a specific document
> > that you think will actually be adversely affected by this change.
> 
> Yeah, looking at this in more detail shows that way more authors even
> than I had realized have stopped using two spaces after sentences. If
> they use one space after a sentence in POD source, they're not
> affected by this change (or arguably even helped at line breaks)
> because they're already going to break *roff's detection of sentence
> boundaries. I found a bunch of examples in prose with a quick search,
> but nearly all of them were by authors that used one space after
> sentences and thus are unaffected by this proposed change.
> 
> Once one excludes those, and authors who use the style of putting the
> period outside the quotes (more common in Europe as I recall), I agree
> that there aren't a ton of examples.
> 
> Here are the ones that I found on my local system that I believe would
> be adversely affected:
> 
> Perl/Critic/DEVELOPER.pod:what's wrong."  The explanation can be either a 
> string with further
> SQL/Translator/Schema/Constraint.pm:then returns "1."  The argument is 
> evaluated by Perl for True or
> SQL/Translator/Manual.pod:with "sqlt."  Here are the scripts and a 
> description of what they each
> Sub/Exporter.pm:"import."  In addition to the normal exporter configuration, 
> a few named
> WWW/Mechanize.pm:A value of C<0> means "no history at all."  By default, the 
> max stack depth
> 
> I didn't do a very sophisticated search (only Perl modules, a very simple
> search pattern), so I doubtless missed some. The search that seemed the
> most effective was:
> 
>     grep -r '^[^    ].*\."  [^ ]' /usr/share/perl5
> 
> (That first bracketed section contains a space and a literal tab.)
> 
> I found it easier to search the actual POD source, not the generated
> man pages, which confuse matters with other sorts of markup. Pod::Man
> will pass through those lines essentially verbatim, so these should be
> valid examples of man pages that I think would then be misformatted.
> 
> Note that this search looks specifically for prose examples, since
> that's what I was looking for, and will exclude some code examples
> that you may also be interseted in. You can find the code as well by
> dropping the initial anchoring, but then you match a lot of actual
> source code instead of POD, and the search problem becomes harder than
> I have the brainpower to deal with this late in the evening.

[much snippage]

I decided to try to get a more representative view of the situation by
measuring actual formatted output rather than meeting your predictions
with my own about how the proposed (and active in groff Git's master
branch) change would affect POD documents.

Here's my method.[1]

1.  Grab all the ".pod" files installed on my Debian bookworm(-ish) box.
2.  Run them through _current_ podlators pod2man,[2] _with_ groff Git
    HEAD, but with the `cflags 0 "` line in "tmac/an.tmac" commented
    out, disabling adjustment at the command line, and saving standard
    output and standard error.  This captures the "status quo ante".
3.  Uncomment the `cflags 0 "` line.
4.  Run the same file through _current_ podlators pod2man again, saving
    standard output and standard error.  This captures the effect of my
    reckless and dangerous change.  ;-)
5.  Diff 'em.

The first thing to note is the magnitude of the change's impact.

$ dlocate -S '*.pod$' | wc -l
557

I have 557 *.pod files on my box.

These produce 347,238 lines of nroff-formatted output.

$ wc -l /tmp/pod-out-[12].txt
  347238 /tmp/pod-out-1.txt
  347238 /tmp/pod-out-2.txt
  694476 total

The `cflags 0 "` change alters 126 lines, or 0.0363% of (this sample of)
all pod2man output.[3]

Are the changes for the better or the worse?  Let's see.  I won't
scrutinize all 126, but just the first 13, assuming that's a random
enough sample (in other words, authors' punctuation and quotation
practices are not meaningfully correlated with the collation order of
the Debian packages that ship them [that's how dlocate(1) sorts its
output]).

--- /tmp/pod-out-1.txt  2025-10-09 18:40:11.555940885 -0500
+++ /tmp/pod-out-2.txt  2025-10-09 18:45:03.518209913 -0500
@@ -9071,7 +9071,7 @@
      ter behaved ‐‐ i.e., that they will be just like English.
 
      But the Arabic translator is the next to write back.  First off, your code
-     for "I scanned %g directory." or "I scanned %g directories."  assumes
+     for "I scanned %g directory." or "I scanned %g directories." assumes
      there’s only singular or plural.  But, to use linguistic jargon again, 
Ara‐
      bic has grammatical number, like English (but unlike Chinese), but it’s a
      three‐term category: singular, dual, and plural.  In other words, the way

Looks correct.  The quoted sentences aren't sentential in context.

@@ -9692,7 +9692,7 @@

      ...emits the right text for this language.  If the object in $lh belongs 
to
      class "TkBocciBall::Localize::fr" and %TkBocciBall::Localize::fr::Lexicon
-     contains "("You won!"  => "Tu as gagne!")", then the above code happily
+     contains "("You won!" => "Tu as gagne!")", then the above code happily
      tells the user "Tu as gagne!".

 METHODS

Same.

@@ -16140,7 +16140,7 @@
          (W regexp)(F) A character class range must start and end at a literal
          character, not another character class like "\d" or "[:alpha:]".  The
          "-" in your false range is interpreted as a literal "-".  In a
-         "(?[...])"  construct, this is an error, rather than a warning.  Con‐
+         "(?[...])" construct, this is an error, rather than a warning.  Con‐
          sider quoting the "-", "\-".  The <-- HERE shows whereabouts in the
          regular expression the problem was discovered.  See perlre.

Same.

@@ -18449,7 +18449,7 @@
          ning with "[." and ending with ".]" is reserved for future extensions.
          If you need to represent those character sequences inside a regular 
ex‐
          pression character class, just quote the square brackets with the 
back‐
-         slash: "\[."  and ".\]".  The <-- HERE shows whereabouts in the 
regular
+         slash: "\[." and ".\]".  The <-- HERE shows whereabouts in the regular
          expression the problem was discovered.  See perlre.

      POSIX syntax [= =] is reserved for future extensions in regex; marked by

@@ -36039,7 +36039,7 @@
      time the event loop detects that the file descriptor given is readable
      and/or writable).

-     Each watcher type further has its own "ev_TYPE_set (watcher *, ...)"  
macro
+     Each watcher type further has its own "ev_TYPE_set (watcher *, ...)" macro
      to configure it, with arguments specific to the watcher type. There is 
also
      a macro to combine initialisation and setting in one call: "ev_TYPE_init
      (watcher *, callback, ...)".

Same.

@@ -67681,7 +67681,7 @@
                             'command2' => [qw(sub1 sub2 sub3)]);

 DESCRIPTION
-     Creates "->commandSub(...)" as an alias for "->command('sub',...)"  e.g.
+     Creates "->commandSub(...)" as an alias for "->command('sub',...)" e.g.
      "->grabRelease" for "->grab('release')".

      For each command/subcommand pair this creates a closure with command and

Same.

@@ -82112,7 +82112,7 @@
      pragma.

      The -T option is also forbidden on the "#!" line of a script, unless it 
was
-     present on the Perl command line.  Due to the way "#!"  works, this 
usually
+     present on the Perl command line.  Due to the way "#!" works, this usually
      means that -T must be in the first argument.  Thus:

          #!/usr/bin/perl -T -w

Same.

@@ -90276,7 +90276,7 @@
      Uppercase X/B allowed in hexadecimal/binary literals

      Literals may now use either upper case "0X..." or "0B..." prefixes, in ad‐
-     dition to the already supported "0x..." and "0b..."  syntax [perl #76296].
+     dition to the already supported "0x..." and "0b..." syntax [perl #76296].

      C, Ruby, Python, and PHP already support this syntax, and it makes Perl
      more internally consistent: a round‐trip with "eval sprintf "%#X", 0x10"

Same.

@@ -90784,8 +90784,8 @@

      On Windows parent processes would not terminate until all forked children
      had terminated first.  However, "kill("KILL", ...)" is inherently unstable
-     on pseudo‐processes, and "kill("TERM", ...)"  might not get delivered if
-     the child is blocked in a system call.
+     on pseudo‐processes, and "kill("TERM", ...)" might not get delivered if 
the
+     child is blocked in a system call.

      To avoid the deadlock and still provide a safe mechanism to terminate the
      hosting process, Perl now no longer waits for children that have been sent

Same.

@@ -98278,7 +98278,7 @@
    Splitting the tokens "(?" and "(*" in regular expressions
      A deprecation warning is now raised if the "(" and "?" are separated by
      white space or comments in "(?...)" regular expression constructs.  Simi‐
-     larly, if the "(" and "*" are separated in "(*VERB...)"  constructs.
+     larly, if the "(" and "*" are separated in "(*VERB...)" constructs.

    Pre‐PerlIO IO implementations
      In theory, you can currently build perl without PerlIO.  Instead, you’d 
use

Same.

@@ -106774,7 +106774,7 @@
          the class is inverted or the sequence is specified as the beginning or
          end of a range.  In these cases, the only behavior change from before
          is a slight rewording of the fatal error message given when this class
-         is part of a "?[...])" construct.  When the "[...]"  stands alone, the
+         is part of a "?[...])" construct.  When the "[...]" stands alone, the
          same non‐fatal warning as before is raised, and only the first charac‐
          ter in the sequence is used, again just as before.

Same.

@@ -107244,7 +107244,7 @@
          a segfault instead of a proper error message.  [perl #126180]
          <https://rt.perl.org/Ticket/Display.html?id=126180>

-     •   Another problem with "(?[...])"  constructs has been fixed wherein
+     •   Another problem with "(?[...])" constructs has been fixed wherein
          things like "\c]" could cause panics.  [perl #126181]
          <https://rt.perl.org/Ticket/Display.html?id=126181>

Same.

@@ -109220,7 +109220,7 @@
          or a C-level assert. [perl #126615], [perl #126602], [perl #126193].

      •   There were places in regular expression patterns where comments
-         ("(?#...)")  weren’t allowed, but should have been.  This is now 
fixed.
+         ("(?#...)") weren’t allowed, but should have been.  This is now fixed.
          [perl #116639] <https://rt.perl.org/Ticket/Display.html?id=116639>

      •   Some regressions from Perl 5.20 have been fixed, in which some syntax

Same.

Thus, sampling just over 10.3% of the 0.0363% of all lines of
representative pod2man(1) output (when formatting for an 80-column
terminal), I find _no_ exhibits where `cflags 0 "` makes formatting
worse; in every exhibit, the formatting is improved.  Inter-sentence
space is not being injected at locations that are not sentence
boundaries.  It is also not being removed where it should not be.

If I knew how to calculate a p-value for this result, I would.

How do these empirical data influence your view?

Regards,
Branden

[1]

$ vi ~/groff-HEAD/share/groff/1.23.0/tmac/an.tmac
$ time for F in $(dlocate -S '*.pod$'); do ./scripts/pod2man "$F"; done | nroff 
-dAD=l -man -P -cbou >| /tmp/pod-out-1.txt 2>| /tmp/pod-err.1.txt
$ vi ~/groff-HEAD/share/groff/1.23.0/tmac/an.tmac
$ time for F in $(dlocate -S '*.pod$'); do ./scripts/pod2man "$F"; done | nroff 
-dAD=l -man -P -cbou >| /tmp/pod-out-2.txt 2>| /tmp/pod-err.2.txt

The timed commands took about 90 seconds on my machine.

[2]

$ pwd
/home/branden/src/GIT/podlators
$ git describe
release/v6.0.2-28-g002a6ba

[3]

$ bc -l
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free 
Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
126/347238
.00036286351148203825

Attachment: signature.asc
Description: PGP signature

            • ... G. Branden Robinson
              • ... Russ Allbery
              • ... G. Branden Robinson
              • ... Russ Allbery
              • ... G. Branden Robinson
          • ... Russ Allbery
            • ... G. Branden Robinson
              • ... Russ Allbery
              • ... G. Branden Robinson
              • ... Russ Allbery
              • ... G. Branden Robinson
              • ... G. Branden Robinson
              • ... Russ Allbery
              • ... G. Branden Robinson
              • ... G. Branden Robinson
  • Re: " vs... Alejandro Colomar via GNU roff typesetting system discussion

Reply via email to