Hi Ingo,

Thanks for correcting me on mandoc(1) behavior; I have never
(consciously) caught it applying supplemental inter-sentence space, so I
had leapt to the conclusion that, like adjustment and automatic
hyphenation, you considered it an anti-goal for the project.

Minutia below.  :)

At 2025-08-24T19:17:57+0200, Ingo Schwarze wrote:
> G. Branden Robinson wrote on Sat, Aug 23, 2025 at 05:21:14PM -0500:
> 
> > However, several characters are treated
> > _transparently_ after the occurrence of an end-of-sentence character.
> [...]
> > The default set is '"', ''', ')', ']', '*', '\[dg]', '\[dd]', '\[rq]',
> > and '\[cq]'.
> 
> In mandoc, that set is slightly smaller: " ' ) ]
> These are missing in mandoc: * and the specials dg dd rq cq
> 
> The difference is not deliberate, but a minor bug in mandoc.
> I simply wasn't aware yet that there are more eos-transparent
> characters in groff.

Acknowledged.  Dagger and double-dagger signs seem vanishingly rare in
man pages.  As are typographical quotation marks of any kind, because
their special character forms are GNU innovations and, seemingly, even
in ordinary character form almost no one outside of the Bell Labs CSRC
ever understood how to use them in a man page anyway.

> >     I therefore propose that _groff_'s man(7) and mdoc(7) packages
> >     clear the `cflags` bit for the `"` character.[2]
> 
> No objection.  Mandoc should eventually follow such a change, which
> isn't a drama.  There is obbious work to do in mandoc in this region
> anyway.

Cool.  I'll let the question percolate on the list a bit longer.

> >     We don't want inter-sentence space intruding into code examples.
> 
> Not sure how that could happen.  The code would have to contain
> a double-quote character, then a space character, then another
> character.

Not quite.  It would have to contain a sentence-ending character, then
zero or more EOS-transparent characters, then EITHER a newline (if
filling is enabled) or two or more ordinary spaces (regardless of
filling).

Here's an example.

$ cat ATTIC/supplemental-space-in-code-example.man
.TH foo 1 2025-08-24 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
To get a list of files ending in a dot,
which can cause problems on VFAT file systems, try
\[lq]find /a/long/directory/name/to/force/an/input/line/break \-name "*."
\-print\[rq].

And here's the output, with adjustment manually defeated for extra
demonstrative power, even though the line where supplementation occurs
wouldn't be automatically adjusted anyway.

$ nroff -d AD=l -man ATTIC/supplemental-space-in-code-example.man
foo(1)                       General Commands Manual                      foo(1)

Name
     foo - frobnicate a bar

Description
     To get a list of files ending in a dot, which can cause problems on VFAT
     file systems, try “find /a/long/directory/name/to/force/an/input/line/break
     -name "*."  -print”.

groff test suite                   2025‐08‐24                             foo(1)

When filling is disabled and a monospaced font is used, the problem is
harder to provoke observably, because two "literal" spaces after a
period look the same as an ordinary space followed by a supplemental
inter-sentence space.

*But*, if the supplemental inter-sentence space is not the formatter
(and English-localized) default, there *is* a difference.

In this example, I'll use a shorter line length just to make the email
more pleasant, and groff's "ascii" output device to conceal an incorrect
but popular glyph choice for the neutral apostrophe.

$ cat ATTIC/sentence-ending-detector.man
.TH foo 1 2025-08-24 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
Let us feebly attempt to detect multiple sentences on an input line in
.I man
documents.
.RS
.P
.EX
.\" This example should use \[aq]--or \(aq--instead of `'`, but let's be
.\" "realistic" and simulate inexpertly composed man pages.
grep '[A\-Za\-z].  *' *.man
grep '[A\-Za\-z]. *' *.man
.EE
.RE
$ nroff -r LL=65n -man -Tascii ATTIC/sentence-ending-detector.man
foo(1)               General Commands Manual               foo(1)

Name
     foo - frobnicate a bar

Description
     Let us feebly attempt to detect multiple sentences on an in-
     put line in man documents.

            grep '[A-Za-z].  *' *.man
            grep '[A-Za-z]. *' *.man

groff test suite            2025-08-24                     foo(1)
$ nroff -r LL=65n -man -mfr -Tascii ATTIC/sentence-ending-detector.man
foo(1)            Manuel des commandes generales           foo(1)

Name
     foo - frobnicate a bar

Description
     Let us feebly attempt to detect multiple sentences on an in-
     put line in man documents.

            grep '[A-Za-z]. *' *.man
            grep '[A-Za-z]. *' *.man

groff test suite            2025-08-24                     foo(1)

Thus--BOOM--one's "literal code example" starts lying to the reader.

> ... So i guess you would have
> to have the code outside literal context,

(A) "literal context" is an mdoc(7) concept.  man(7)'s less rich
set of semantic designators, along with its more widespread use, may
combine to make the problem more likely in documents using that package.
(Though still rare enough that, unfortunately, few will anticipate it.)

> which is generally a dubious idea at best.  The following does result
> in sentence spacing after the "yes." with both groff and mandoc:
> 
>   .Bd -filled
>   const char *answer = flag ? "yes."
>   : "no.";
>   .Ed
> 
> But why would anyone use fill mode to input a code sample?

If it's of significant length, I would certainly advise against it.

> I'm not sure i'm able to construct a realistic example that
> would demonstrate any practical problem.
> 
> I probably wouldn't make " intransparent because changing
> existing behaviour usually requires making somethings better
> that matters in practice - if no clear benefit can be demonstrated,
> what's the point in risking a regression for some user out there?

My proposal would be stronger if I could point to a real-world man page
that misrenders due to `"`'s transparency, true, so I'll go looking for
one.

> Then again, i don't feel strongly either way, don't really object, and
> mandoc would likely eventually follow whatever is decided in this
> respect.

The change is easy to make, easy to revert, and easy for users/
distributors to hack back out in a "man.local" and/or "mdoc.local" file
(if we show them an annotated, commented-out example as we do other
concessions to ill-written man pages).

The closest analogy I can think of is one mentioned in our "PROBLEMS"
file.

--snip--
In man pages (only), groff maps the minus sign special character '\-' to
the Basic Latin hyphen-minus (U+002D) because man pages require this
glyph and there is no historically established *roff input character,
ordinary or special, for obtaining it when a hyphen and minus sign are
both separately available.  To obtain a true minus sign, use the special
character escape sequences '\(mi' or '\[mi]'.
--end snip--

The need for a literal `"` is more acute in man page formatting than in
general typesetting, and AT&T troff surrendered a hostage to fortune by
not offering special character escape sequences for quotation characters
back when doing so would have smoothed the path to the future.

> The following does result in sentence spacing with mandoc:
> 
>   Do I need cfree?"
>   Subsequent sentence.

Ah!  Thanks for correcting my misconception; I'll try not to repeat it.

> > I checked, and both commands rendered the document as I expected in
> > a Latin-1 terminal emulator as well.  (_mandoc_ never applies
> > supplemental inter-sentence space.)
> 
> Sounds like a slight over-generalization.

Indeed.  I "ass"umed and thereby made an ass of myself.  A likely
contributing factor was that I surmised that Kristaps and you are both
European, and in Europe, supplemental inter-sentence space in
typesetting seems to be applied less, or not at all.

$ git grep -w ss tmac/[a-z][a-z].tmac
tmac/cs.tmac:.ss 12 0
tmac/de.tmac:.ss 12 0
tmac/en.tmac:.ss 12
tmac/es.tmac:.ss 12 0
tmac/fr.tmac:.ss 12 0
tmac/it.tmac:.ss 12 0
tmac/ru.tmac:.ss 12 0
tmac/sv.tmac:.ss 12 0

Thanks for shedding light into this dark corner!

Regards,
Branden

Attachment: signature.asc
Description: PGP signature

Reply via email to