Hi,
onf wrote on Thu, Oct 30, 2025 at 11:23:14PM +0100:
> I am sure Branden is gonna be happy because I found another questionable
> behavior of groff's mdoc implementation. When using Bq in a columnated
> list (aka a table), Bq encloses everything until end of input line in
> square brackets, rather than just the stuff until next Ta (i.e. column
> separator). No diagnostics are produced. mandoc does not exhibit this
> behavior.
>
> Example to reproduce:
> .Dd October 30, 2025
> .Dt EXAMPLE 1
> .Os
> .Sh DESCRIPTION
> .Bl -column [One] [Two] [Three]
> .It Bq One Ta Bq Two Ta Bq Three
> .El
Arguably, the groff output
[One [Two [Three]]]
is more correct than the mandoc output
[One] [Two] [Three]
because mandoc mdoc(7) says:
Block partial-implicit
Like block full-implicit, but with single-line scope closed
by the end of the line.
and groff_mdoc(7) says:
The '.Op' macro places option brackets around any remaining
arguments on the command line, and places any trailing
punctuation outside the brackets.
There are two problems with the description in groff_mdoc(7):
the term "command line" is completely wrong here, this has nothing
to do with the (groff(1) or troff(1) or nroff(1)) command line,
the intended meaning is "macro line" ("macro input line", "input
line", or "logical input line" would also be correct, but i prefer
the term "macro line" because it is concise and unambigious);
and the manual page fails to describe the syntax of the other
block partial-implicit enclosure macros at all - in quoting the
description of .Op here, i'm using the fact that i already know
that .Op and .Bq work the same way, which is not obvious a priori.
But both documentation bugs are unrelated to the potential problem
you report.
So, which are the advantages of the groff behaviour?
* It matches the current documentation.
* It is (slightly) easier to describe; saying "closed by the end
of the input line, or if inside a .Bl -column list, the end of
the current column, whichever comes earlier" would be possible
and not terribly long, but it would be slightly more complicated.
* It is easier to implement in groff.
And which are the advantages of the mandoc behaviour?
* It is arguably more useful in practice; why would anyone
want an encosure macro to span .Bl -column columns?
* It is *much* easier to implement in mandoc, and to such an
extreme degree that i would likely choose to not implement the
groff behaviour even if everyone agreed that the groff behaviour
would be more useful (which i doubt). The problem is that the
syntax tree for the groff behaviour would have to have the
following structure:
It (block)
It (body)
Bq (block)
Bq (body)
text
It (body)
Bq (block)
Bq (body)
text
It (body)
Bq (block)
Bq (body)
text
It (block)
...
This structure breaks the very fundamental invariant that a
body-type block can only be a child of a block-type block of
the same macro, and in particular not a child a body-type block
of some other macro. Trying to implement a violation of this
invariant would run a high risk of triggering assertion failures
or other severe misbehaviour in other parts of the code.
The logic for handling so-called "badly nested blocks"
(like .Ao Bo foo Ac bar Bc) is already very complicated and
took a long time and extraordinary effort to get right,
but even that does not help here. Sure, in badly-nested
block parlance, one can argue that the .Ta macros break
the .Bq blocks, but the way that is implemented is inserting
a special "body-end" token for the breaking block at the place
inside the broken block where the formatting of the breaking block
ends, while structurally, the end of the breaking block is deferred
until the end of the broken block. For the It/Bq invariant
violation, that wouldn't help at all because it would not mitigate
the problem that the "It (body)" would have to be a child of
the "Bq (body)", which it cannot be.
So implementing the groff behaviour would require abandoning one
of the most fundamental invariants (most fundamental both with
respect to language grammar not just in the mdoc(7) language,
but in essentially any block-oriented language including
HTML, LaTeX, C, Shell, Perl, Python, Pascal, FORTRAN etc. and
also most fundametal with respect to the concrete implementation),
and it would require implementing a completely new structuring
concept in the syntax tree.
By the way. exactly the same issue occurs in mandoc with .Bo,
just like it occurs with .Bq:
input: .It Bo One Ta Two Bc Ta Three
groff output: [One Two] Three
mandoc output: [One] Two Three
mandoc message: ERROR: skipping end of block that is not open: Bc
I consider this practically unfixable.
What can be done is that both manuals can be improved:
In groff_mdoc(7), the two bugs can be fixed that i pointed out above,
and both manuals could point out that enclosures spanning .Bl -column
columns make no sense from a logical perspective and format
inconsistently with different formatters. I would have to think
a bit more about it though as it seems likely this is not the
only case where blocks nested badly in unusual ways cause subtle
issues, and giving undue weight to one particular mini-issue while
glossing over others should probably be avoided.
Yours,
Ingo