Hi,

onf wrote on Thu, Oct 30, 2025 at 11:23:14PM +0100:

> I am sure Branden is gonna be happy because I found another questionable
> behavior of groff's mdoc implementation. When using Bq in a columnated
> list (aka a table), Bq encloses everything until end of input line in
> square brackets, rather than just the stuff until next Ta (i.e. column
> separator). No diagnostics are produced. mandoc does not exhibit this
> behavior.
> 
> Example to reproduce:
>   .Dd October 30, 2025
>   .Dt EXAMPLE 1
>   .Os
>   .Sh DESCRIPTION
>   .Bl -column [One] [Two] [Three]
>   .It Bq One Ta Bq Two Ta Bq Three
>   .El

Arguably, the groff output

  [One     [Two     [Three]]]

is more correct than the mandoc output

  [One]    [Two]    [Three]

because mandoc mdoc(7) says:

   Block partial-implicit
     Like block full-implicit, but with single-line scope closed
     by the end of the line.

and groff_mdoc(7) says:

     The '.Op' macro places option brackets around any remaining
     arguments on the command line, and places any trailing
     punctuation outside the brackets.

There are two problems with the description in groff_mdoc(7):
the term "command line" is completely wrong here, this has nothing
to do with the (groff(1) or troff(1) or nroff(1)) command line,
the intended meaning is "macro line" ("macro input line", "input
line", or "logical input line" would also be correct, but i prefer
the term "macro line" because it is concise and unambigious);
and the manual page fails to describe the syntax of the other
block partial-implicit enclosure macros at all - in quoting the
description of .Op here, i'm using the fact that i already know
that .Op and .Bq work the same way, which is not obvious a priori.
But both documentation bugs are unrelated to the potential problem
you report.

So, which are the advantages of the groff behaviour?

 * It matches the current documentation.
 * It is (slightly) easier to describe; saying "closed by the end
   of the input line, or if inside a .Bl -column list, the end of
   the current column, whichever comes earlier" would be possible
   and not terribly long, but it would be slightly more complicated.
 * It is easier to implement in groff.

And which are the advantages of the mandoc behaviour?

 * It is arguably more useful in practice; why would anyone
   want an encosure macro to span .Bl -column columns?
 * It is *much* easier to implement in mandoc, and to such an
   extreme degree that i would likely choose to not implement the
   groff behaviour even if everyone agreed that the groff behaviour
   would be more useful (which i doubt).  The problem is that the
   syntax tree for the groff behaviour would have to have the
   following structure:

   It (block)
     It (body)
       Bq (block)
         Bq (body)
           text
           It (body)
             Bq (block)
               Bq (body)
                 text
                 It (body)
                   Bq (block)
                     Bq (body)
                       text
   It (block)
     ...

   This structure breaks the very fundamental invariant that a
   body-type block can only be a child of a block-type block of
   the same macro, and in particular not a child a body-type block
   of some other macro.  Trying to implement a violation of this
   invariant would run a high risk of triggering assertion failures
   or other severe misbehaviour in other parts of the code.

   The logic for handling so-called "badly nested blocks"
   (like .Ao Bo foo Ac bar Bc) is already very complicated and
   took a long time and extraordinary effort to get right,
   but even that does not help here.  Sure, in badly-nested
   block parlance, one can argue that the .Ta macros break
   the .Bq blocks, but the way that is implemented is inserting
   a special "body-end" token for the breaking block at the place
   inside the broken block where the formatting of the breaking block
   ends, while structurally, the end of the breaking block is deferred
   until the end of the broken block.  For the It/Bq invariant
   violation, that wouldn't help at all because it would not mitigate
   the problem that the "It (body)" would have to be a child of
   the "Bq (body)", which it cannot be.

   So implementing the groff behaviour would require abandoning one
   of the most fundamental invariants (most fundamental both with
   respect to language grammar not just in the mdoc(7) language,
   but in essentially any block-oriented language including
   HTML, LaTeX, C, Shell, Perl, Python, Pascal, FORTRAN etc. and
   also most fundametal with respect to the concrete implementation),
   and it would require implementing a completely new structuring
   concept in the syntax tree.

By the way. exactly the same issue occurs in mandoc with .Bo,
just like it occurs with .Bq:

  input:           .It Bo One Ta Two Bc Ta Three
  groff output:    [One     Two]     Three
  mandoc output:   [One]    Two      Three
  mandoc message:  ERROR: skipping end of block that is not open: Bc

I consider this practically unfixable.

What can be done is that both manuals can be improved:
In groff_mdoc(7), the two bugs can be fixed that i pointed out above,
and both manuals could point out that enclosures spanning .Bl -column
columns make no sense from a logical perspective and format
inconsistently with different formatters.  I would have to think
a bit more about it though as it seems likely this is not the
only case where blocks nested badly in unusual ways cause subtle
issues, and giving undue weight to one particular mini-issue while
glossing over others should probably be avoided.

Yours,
  Ingo

Reply via email to