Hi Branden,

G. Branden Robinson wrote on Wed, Oct 29, 2025 at 06:15:06PM -0500:
> At 2025-10-29T18:28:48+0100, Ingo Schwarze wrote:

>>  2. In mandoc, the roff *parsers* (roff.c, roff_escape.c)
>>     predefine both the .T string and the .T register (rather than
>>     somehow "simulating" them) because both are user-visible features
>>     of the roff(7) language.

> Fair.  I still don't think of mandoc as implementing a *roff "formatter"
> because so many features of a CSTR #54 *roff are missing.  And when I
> say "formatter", I refer to the *roff _program_ being used to translate
> the *roff language into output, so I include lexical analysis, parsing
> into a tree representation (or in *roff's case, a forest of trees), and
> "code generation": trout/grout generation or, historically, terminal
> output "directly".

Oh, i see, that's what results when two programs use very different
architecture: terminology evolves to mismatch and even clash.

Since the architecture of mandoc revolves around the (macro) syntax
trees (specifically, the mdoc(7) and man(7) ASTs, both of which
can contain tbl(7) and eqn(7) subtrees, and, in a limited number
of cases, nodes representing specific roff(7) requests), the
terminology used for describing mandoc program structure also
revolves around the function of any given module with respect
to the ASTs:

 * parsers:    input  = mdoc(7) or man(7) file optionally containing
                        tbl(7), eqn(7), and roff(7) bits
               output = AST
 * validators: input  = AST (unvalidated)
               output = AST (validated and often slightly transformed
                             or normalized)
 * formatters: input  = AST
               output = target format (ascii, utf8, ps, pdf, html, markdown)

For example, the "roff(7) formatters" are very small modules that
transform very small numbers of AST nodes that represent roff(7)
requests into text and HTML output.  The reason why the roff
formatters are so small and relatively unimportant is because
most roff(7) requests neither produce output nor impact the
document structure of the output document.
Here is the complete list of roff(7) requests handled by the 
roff formatters in mandoc:

  produce output:                    .br .sp
  impact document structure:         .ce .fi .nf .rj
  change state that impacts output:  .ft .ll .mc .po .ta .ti


[...]
> Please suggest a recast and I can update the "ChangeLog" file entry.

I would say something like:

   mandoc(1) does not exhibit this problem because it does not use
   roff(7) registers to distinguish mdoc(7) child macros from
   plain-text macro arguments.

If you want to be even more explicit, you can say something like:

   plain-text macro arguments, so the .T register does not clash
   with the .T predefined string in the way it did in groff_mdoc(7).

> doc-get-arg-type (without the asterisk) seems to be called only by the
> internals of a one other macro, `doc-do-Bl-args`.

Not quite, i briefly looked at the code and my impression is
it is used for

  .Bd -offset .T
  .Bl -offset .T
  .Bl -width  .T
  -Bl -column ..T

As far as i read the code (confirmed by quick testing), the first three
of the above cases get interpreted as

  a 1u display indentation
  a 1u list indentation 
  a 2n+1u list item indentation

The (IMHO correct) results produced by mandoc are:

  a 2n display indentation
  a 2n list indentation
  a 2n+2n list item indentation

The fourth of the above cases is very weird.
in .Bl -column, a macro argument that is not an option keyword
(like -offset or -width) appears to get interpreted as follows:

Usually, the string length of the argument gets used as the
column width in units of n.  But there is one weird exception.
If the argument starts with a dot and doc-get-arg-type classifies
the rest of the argument as a macro name, then the width of that
rest (when set in a diversion) gets used as the column width instead.

Mandoc does not implement that special case.
So with mandoc, ".Er" results in a column width of 3n because the
string length of ".Er" is 3, whereas groff appears to interpret it
as a width of 2u (not 2n!) because the length of the string "Er"
is 2.  I'm not sure i got this entirely right, but since the
apparent behaviour near the end of doc-do-Bl-args makes no sense
to me, i suspect some other behaviour was intended, and no one ever
noticed the bug because few - if any - real world manual pages
actually specify .Bl -column column withs by providing macro
arguments starting with dots.  For example, the intended behaviour
might be for "-column .macro" to use the so-called default width
of the macro as the column with, but that clearly doesn't work
in groff_mdoc(7) and isn't implemented in mandoc either.
In general, i think the whole concept of "default widths" isn't
a particularly useful feature.  Admittedly, it does see quite a
bit of real-world use in the form of ".Bl -tag -width Er"
in ".Sh ERRORS" sections, but i fear very few people understand
how that really works and it mostly gets carried forward by the
faithful like a cargo cult would (except that this incantation
actually works :-).

Whatever may be going on with -column, i do believe that doc-do-Bl-args
contains the same bug that doc-do-Bl-args* did, given how it results
in the presumably wrong behaviour described for -offset and -width above.

> I didn't check whether one can construct a `.Bl -whatever .T -foo` call
> that misbehaves.

Here you are.

> And I don't know why two different macros are needed here.

Well, there are two differences:

 * the star-variant operates on the string doc-arg\$1
   whereas the vanilla variant operates directly on \$1
 * the star-variant internally sets doc-width
   wheres the vanilla variant requires it to be set before the call

Could the calls to one be refactored to use the other?
Maybe, but watch out for side effects of setting versus not
setting the global string doc-arg\$1 and the global register doc-width.

> It seems tempting to unify the logic here but that means going down the
> mdoc rabbit hole and writing more unit tests, both of which will
> distract me from other work I need to get done to push groff 1.24.0.rc1.

Fair enough.  Given how complicated the groff_mdoc(7) implementation
is (as opposed to the user-visible logic which i deem fairly simple),
how obscure internal variable names tend to be, how much the
groff_mdoc(7) code relies on side effects and global variables,
i'm sure there are tons of subtle bugs that should eventually be
addressed, but i do *not* claim working on that code is particularly
inviting, nor particularly urgent, given that it *mostly* works as
intended, that manual page misformatting in obscure edge cases
rarely results in security risks, and that more severe misformattings
that matter in practice tend to get fixed when they are found.

Yours,
  Ingo

Reply via email to