Hi Russ, At 2025-09-27T22:35:52-0700, Russ Allbery wrote: > "G. Branden Robinson" <[email protected]> writes: > > At this point I must ask that you direct me to a specific document > > that you think will actually be adversely affected by this change. > > Yeah, looking at this in more detail shows that way more authors even > than I had realized have stopped using two spaces after sentences. If > they use one space after a sentence in POD source, they're not > affected by this change (or arguably even helped at line breaks) > because they're already going to break *roff's detection of sentence > boundaries. I found a bunch of examples in prose with a quick search, > but nearly all of them were by authors that used one space after > sentences and thus are unaffected by this proposed change. > > Once one excludes those, and authors who use the style of putting the > period outside the quotes (more common in Europe as I recall), I agree > that there aren't a ton of examples. > > Here are the ones that I found on my local system that I believe would > be adversely affected: > > Perl/Critic/DEVELOPER.pod:what's wrong." The explanation can be either a > string with further > SQL/Translator/Schema/Constraint.pm:then returns "1." The argument is > evaluated by Perl for True or > SQL/Translator/Manual.pod:with "sqlt." Here are the scripts and a > description of what they each > Sub/Exporter.pm:"import." In addition to the normal exporter configuration, > a few named > WWW/Mechanize.pm:A value of C<0> means "no history at all." By default, the > max stack depth > > I didn't do a very sophisticated search (only Perl modules, a very simple > search pattern), so I doubtless missed some. The search that seemed the > most effective was: > > grep -r '^[^ ].*\." [^ ]' /usr/share/perl5 > > (That first bracketed section contains a space and a literal tab.) > > I found it easier to search the actual POD source, not the generated > man pages, which confuse matters with other sorts of markup. Pod::Man > will pass through those lines essentially verbatim, so these should be > valid examples of man pages that I think would then be misformatted. > > Note that this search looks specifically for prose examples, since > that's what I was looking for, and will exclude some code examples > that you may also be interseted in. You can find the code as well by > dropping the initial anchoring, but then you match a lot of actual > source code instead of POD, and the search problem becomes harder than > I have the brainpower to deal with this late in the evening.
[much snippage]
I decided to try to get a more representative view of the situation by
measuring actual formatted output rather than meeting your predictions
with my own about how the proposed (and active in groff Git's master
branch) change would affect POD documents.
Here's my method.[1]
1. Grab all the ".pod" files installed on my Debian bookworm(-ish) box.
2. Run them through _current_ podlators pod2man,[2] _with_ groff Git
HEAD, but with the `cflags 0 "` line in "tmac/an.tmac" commented
out, disabling adjustment at the command line, and saving standard
output and standard error. This captures the "status quo ante".
3. Uncomment the `cflags 0 "` line.
4. Run the same file through _current_ podlators pod2man again, saving
standard output and standard error. This captures the effect of my
reckless and dangerous change. ;-)
5. Diff 'em.
The first thing to note is the magnitude of the change's impact.
$ dlocate -S '*.pod$' | wc -l
557
I have 557 *.pod files on my box.
These produce 347,238 lines of nroff-formatted output.
$ wc -l /tmp/pod-out-[12].txt
347238 /tmp/pod-out-1.txt
347238 /tmp/pod-out-2.txt
694476 total
The `cflags 0 "` change alters 126 lines, or 0.0363% of (this sample of)
all pod2man output.[3]
Are the changes for the better or the worse? Let's see. I won't
scrutinize all 126, but just the first 13, assuming that's a random
enough sample (in other words, authors' punctuation and quotation
practices are not meaningfully correlated with the collation order of
the Debian packages that ship them [that's how dlocate(1) sorts its
output]).
--- /tmp/pod-out-1.txt 2025-10-09 18:40:11.555940885 -0500
+++ /tmp/pod-out-2.txt 2025-10-09 18:45:03.518209913 -0500
@@ -9071,7 +9071,7 @@
ter behaved ‐‐ i.e., that they will be just like English.
But the Arabic translator is the next to write back. First off, your code
- for "I scanned %g directory." or "I scanned %g directories." assumes
+ for "I scanned %g directory." or "I scanned %g directories." assumes
there’s only singular or plural. But, to use linguistic jargon again,
Ara‐
bic has grammatical number, like English (but unlike Chinese), but it’s a
three‐term category: singular, dual, and plural. In other words, the way
Looks correct. The quoted sentences aren't sentential in context.
@@ -9692,7 +9692,7 @@
...emits the right text for this language. If the object in $lh belongs
to
class "TkBocciBall::Localize::fr" and %TkBocciBall::Localize::fr::Lexicon
- contains "("You won!" => "Tu as gagne!")", then the above code happily
+ contains "("You won!" => "Tu as gagne!")", then the above code happily
tells the user "Tu as gagne!".
METHODS
Same.
@@ -16140,7 +16140,7 @@
(W regexp)(F) A character class range must start and end at a literal
character, not another character class like "\d" or "[:alpha:]". The
"-" in your false range is interpreted as a literal "-". In a
- "(?[...])" construct, this is an error, rather than a warning. Con‐
+ "(?[...])" construct, this is an error, rather than a warning. Con‐
sider quoting the "-", "\-". The <-- HERE shows whereabouts in the
regular expression the problem was discovered. See perlre.
Same.
@@ -18449,7 +18449,7 @@
ning with "[." and ending with ".]" is reserved for future extensions.
If you need to represent those character sequences inside a regular
ex‐
pression character class, just quote the square brackets with the
back‐
- slash: "\[." and ".\]". The <-- HERE shows whereabouts in the
regular
+ slash: "\[." and ".\]". The <-- HERE shows whereabouts in the regular
expression the problem was discovered. See perlre.
POSIX syntax [= =] is reserved for future extensions in regex; marked by
@@ -36039,7 +36039,7 @@
time the event loop detects that the file descriptor given is readable
and/or writable).
- Each watcher type further has its own "ev_TYPE_set (watcher *, ...)"
macro
+ Each watcher type further has its own "ev_TYPE_set (watcher *, ...)" macro
to configure it, with arguments specific to the watcher type. There is
also
a macro to combine initialisation and setting in one call: "ev_TYPE_init
(watcher *, callback, ...)".
Same.
@@ -67681,7 +67681,7 @@
'command2' => [qw(sub1 sub2 sub3)]);
DESCRIPTION
- Creates "->commandSub(...)" as an alias for "->command('sub',...)" e.g.
+ Creates "->commandSub(...)" as an alias for "->command('sub',...)" e.g.
"->grabRelease" for "->grab('release')".
For each command/subcommand pair this creates a closure with command and
Same.
@@ -82112,7 +82112,7 @@
pragma.
The -T option is also forbidden on the "#!" line of a script, unless it
was
- present on the Perl command line. Due to the way "#!" works, this
usually
+ present on the Perl command line. Due to the way "#!" works, this usually
means that -T must be in the first argument. Thus:
#!/usr/bin/perl -T -w
Same.
@@ -90276,7 +90276,7 @@
Uppercase X/B allowed in hexadecimal/binary literals
Literals may now use either upper case "0X..." or "0B..." prefixes, in ad‐
- dition to the already supported "0x..." and "0b..." syntax [perl #76296].
+ dition to the already supported "0x..." and "0b..." syntax [perl #76296].
C, Ruby, Python, and PHP already support this syntax, and it makes Perl
more internally consistent: a round‐trip with "eval sprintf "%#X", 0x10"
Same.
@@ -90784,8 +90784,8 @@
On Windows parent processes would not terminate until all forked children
had terminated first. However, "kill("KILL", ...)" is inherently unstable
- on pseudo‐processes, and "kill("TERM", ...)" might not get delivered if
- the child is blocked in a system call.
+ on pseudo‐processes, and "kill("TERM", ...)" might not get delivered if
the
+ child is blocked in a system call.
To avoid the deadlock and still provide a safe mechanism to terminate the
hosting process, Perl now no longer waits for children that have been sent
Same.
@@ -98278,7 +98278,7 @@
Splitting the tokens "(?" and "(*" in regular expressions
A deprecation warning is now raised if the "(" and "?" are separated by
white space or comments in "(?...)" regular expression constructs. Simi‐
- larly, if the "(" and "*" are separated in "(*VERB...)" constructs.
+ larly, if the "(" and "*" are separated in "(*VERB...)" constructs.
Pre‐PerlIO IO implementations
In theory, you can currently build perl without PerlIO. Instead, you’d
use
Same.
@@ -106774,7 +106774,7 @@
the class is inverted or the sequence is specified as the beginning or
end of a range. In these cases, the only behavior change from before
is a slight rewording of the fatal error message given when this class
- is part of a "?[...])" construct. When the "[...]" stands alone, the
+ is part of a "?[...])" construct. When the "[...]" stands alone, the
same non‐fatal warning as before is raised, and only the first charac‐
ter in the sequence is used, again just as before.
Same.
@@ -107244,7 +107244,7 @@
a segfault instead of a proper error message. [perl #126180]
<https://rt.perl.org/Ticket/Display.html?id=126180>
- • Another problem with "(?[...])" constructs has been fixed wherein
+ • Another problem with "(?[...])" constructs has been fixed wherein
things like "\c]" could cause panics. [perl #126181]
<https://rt.perl.org/Ticket/Display.html?id=126181>
Same.
@@ -109220,7 +109220,7 @@
or a C-level assert. [perl #126615], [perl #126602], [perl #126193].
• There were places in regular expression patterns where comments
- ("(?#...)") weren’t allowed, but should have been. This is now
fixed.
+ ("(?#...)") weren’t allowed, but should have been. This is now fixed.
[perl #116639] <https://rt.perl.org/Ticket/Display.html?id=116639>
• Some regressions from Perl 5.20 have been fixed, in which some syntax
Same.
Thus, sampling just over 10.3% of the 0.0363% of all lines of
representative pod2man(1) output (when formatting for an 80-column
terminal), I find _no_ exhibits where `cflags 0 "` makes formatting
worse; in every exhibit, the formatting is improved. Inter-sentence
space is not being injected at locations that are not sentence
boundaries. It is also not being removed where it should not be.
If I knew how to calculate a p-value for this result, I would.
How do these empirical data influence your view?
Regards,
Branden
[1]
$ vi ~/groff-HEAD/share/groff/1.23.0/tmac/an.tmac
$ time for F in $(dlocate -S '*.pod$'); do ./scripts/pod2man "$F"; done | nroff
-dAD=l -man -P -cbou >| /tmp/pod-out-1.txt 2>| /tmp/pod-err.1.txt
$ vi ~/groff-HEAD/share/groff/1.23.0/tmac/an.tmac
$ time for F in $(dlocate -S '*.pod$'); do ./scripts/pod2man "$F"; done | nroff
-dAD=l -man -P -cbou >| /tmp/pod-out-2.txt 2>| /tmp/pod-err.2.txt
The timed commands took about 90 seconds on my machine.
[2]
$ pwd
/home/branden/src/GIT/podlators
$ git describe
release/v6.0.2-28-g002a6ba
[3]
$ bc -l
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free
Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
126/347238
.00036286351148203825
signature.asc
Description: PGP signature
