Hi Alex,

At 2025-08-23T22:02:01+0200, Alejandro Colomar wrote:
> I was going to replace some unmatched double quote as argument to a
> man(7) macro, which was used as a literal double quote in the output,
> by the more readable (less ambiguous in source code) \[dq].
> 
> However, I've realized that groff(1) seems to treat them slightly
> differently.  Is this intentional, or a bug?

I believe it's intentional.

> Here are the source-code diff, and the formatted diff:
> 
>       $ git diff;
>       diff --git i/man/man3/cfree.3 w/man/man3/cfree.3
>       index 55008e9a7..1698ab6e3 100644
>       --- i/man/man3/cfree.3
>       +++ w/man/man3/cfree.3
>       @@ -80,7 +80,7 @@ .SS 3-arg cfree
>        to free memory allocated with
>        .BR calloc (3),
>        or do I need
>       -.BR cfree ()?"
>       +.BR cfree ()?\[dq]
>        Answer: use
>        .BR free (3).
>        .P
> 
> 
>       $ MANWIDTH=64 diffman-git;
>       --- HEAD:man/man3/cfree.3
>       +++ ./man/man3/cfree.3
>       @@ -58,7 +58,7 @@ DESCRIPTION
>        
>               A frequently asked question is "Can I use free(3) to
>               free memory allocated with calloc(3), or do I need
>       -       cfree()?"  Answer: use free(3).
>       +       cfree()?" Answer: use free(3).
>        
>               An SCO manual writes: "The cfree routine is provided for
>               compliance to the iBCSe2 standard and simply calls free.
> 
> 
> I think the behavior with '"' makes more sense than with '\[dq]'.

I agree.  You're ending a sentence, so supplemental inter-sentence
space should be added after `cfree()?"`.

https://www.gnu.org/software/groff/manual/groff.html.node/Sentences.html

> Maybe some conditional within groff(1) checks for '"' but forgets to
> check for the synonymous '\[dq]'?

It's not a matter of forgetting, but of applying a deliberate policy.

Quoting the (groff Git version of) the aforementioned section:

---snip---
   Normally, the occurrence of a visible non-end-of-sentence character
(as opposed to a space or tab) immediately after an end-of-sentence
character cancels detection of the end of a sentence.  For example, it
would be incorrect for GNU 'troff' to infer the end of a sentence after
the dot in '3.14159'.  However, several characters are treated
_transparently_ after the occurrence of an end-of-sentence character.
That is, GNU 'troff' does not cancel end-of-sentence detection when it
processes them.  This is because such characters are often used as
footnote markers or to close quotations and parentheticals.  The default
set is '"', ''', ')', ']', '*', '\[dg]', '\[dd]', '\[rq]', and '\[cq]'.
The last four are examples of "special characters", escape sequences
whose purpose is to obtain glyphs that are not easily typed at the
keyboard, or which have special meaning to GNU 'troff' (like '\'
itself).
---end snip---

I have a recommendation item for you and a groff man(7) and mdoc(7)
change to propose.

A.  Don't use `\[dq]` here, but don't use `"` either.  Use paired
    typographical quotation marks, thus.

    A frequently asked question is \[lq]Can I use
    .BR free (3)
    to free memory allocated with
    .BR calloc (3),
    or do I need
    .BR cfree ()?\[rq]

    Both _groff_ and _mandoc_ correctly degrade these quotation marks to
    an ASCII `"` on a device lacking typographer's quotes.[1]

    The rule here, as so often, is to _say what you mean_ and trust the
    the formatting system to handle it as best it can.

B.  Why are `"` and `\[dq]` treated differently?  Historical reasons.
    AT&T troff never really embraced the notion of a neutral double-
    quote special character.  If you wanted typographer's quotes, single
    or double, you input ` or ' or `` or '', just like in TeX.  However,
    many users of AT&T troff were used to typewriters--the Teletype
    Model 37 _was_ a typewriter (with some accessory machinery).
    Therefore lots and lots of people didn't typesetting and input `"`
    for quotation, possibly without giving it much thought.  That, I
    think, is why the formatter made it transparent to sentence endings.

    However, that choice is seldom correct for man pages, where `"` is
    much more likely to be used literally to represent programming
    language syntax.

    I therefore propose that _groff_'s man(7) and mdoc(7) packages clear
    the `cflags` bit for the `"` character.[2]  We don't want inter-
    sentence space intruding into code examples.  We already advise
    using `\[dq]` there, but we could be more robust.

    Again, as observed in the footnote for _mandoc_ this doesn't matter
    because it never applies supplemental inter-sentence space.

Regards,
Branden

[1]

$ cat ATTIC/dq.man
.TH foo 1 2025-08-23 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
A frequently asked question is \[lq]Can I use
.BR free (3)
to free memory allocated with
.BR calloc (3),
or do I need
.BR cfree ()?\[rq]
Subsequent sentence.
$ nroff -m an -T ascii ATTIC/dq.man
foo(1)                       General Commands Manual                      foo(1)

Name
     foo - frobnicate a bar

Description
     A  frequently asked question is "Can I use free(3) to free memory allocated
     with calloc(3), or do I need cfree()?"  Subsequent sentence.

groff test suite                   2025-08-23                             foo(1)
$ mandoc -T ascii ATTIC/dq.man
foo(1)                      General Commands Manual                     foo(1)

Name
       foo - frobnicate a bar

Description
       A frequently asked question is "Can I use free(3) to free memory
       allocated with calloc(3), or do I need cfree()?" Subsequent sentence.

groff test suite                  2025-08-23                            foo(1)



I checked, and both commands rendered the document as I expected in a
Latin-1 terminal emulator as well.  (_mandoc_ never applies
supplemental inter-sentence space.)

[2]

_groff_'s Texinfo manual again:

 -- Request: .cflags n c1 c2 ...
     Assign properties encoded by the number N to characters C1, C2, and
     so on.

     Characters, whether ordinary, special, or indexed, have certain
     associated properties.  The first argument is the sum of the
     desired flags and the remaining arguments are the characters to be
     assigned those properties.  Spaces need not separate the CN
     arguments.  Any argument CN can be a character class defined with
     the 'class' request rather than an individual character.  *Note
     Character Classes::.

     The non-negative integer N is the sum of any of the following.
...
     '32'
          Mark the character as transparent for the purpose of
          end-of-sentence recognition.  In other words, an
          end-of-sentence character followed by any number of characters
          with this property is treated as the end of a sentence if
          followed by a newline or two spaces.  This is the same as
          having a zero space factor in TeX.  Initially, characters
          '"')]*\[dg]\[dd]\[rq]\[cq]' have this property.

Attachment: signature.asc
Description: PGP signature

Reply via email to