Hi onf,

At 2025-01-20T01:48:19+0100, onf wrote:
> Actually, BSD mandoc does implement this, it's just documented at
> a poorly visible place in the docs. BSD mandoc's man(1):
>   MANPAGER
>       Any non-empty value of the environment variable MANPAGER is
>       used instead of the standard pagination program, less(1).  If
>       less(1) is used, the interactive :t command can be used to go
>       to the definitions of various terms, for example command line
>       options, command modifiers, internal commands, environment
>       variables, function names, preprocessor macros, errno(2)
>       values, and some other emphasized words.  Some terms may have
>       defining text at more than one place.  In that case, the
>       less(1) interactive commands t and T can be used to move to the
>       next and to the previous place providing information about the
>       term last searched for with :t.  The -O tag[=term] option
>       documented in the mandoc(1) manual opens a manual page at the
>       definition of a specific term rather than at the beginning.
> 
> And it works quite nicely, actually. The definitions are generated
> automatically, so all manpages written in mdoc benefit from it.
> I assume groff mdoc + man-db doesn't implement this?

I'm working on it.

[requoting]
> The definitions are generated automatically

That's the rub.  We need a design for automatic construction of
tag/anchor names from the user-specified names of the items to be
tagged.  In man(7) documents, those taggable items are probably going to
be:

1.  the identifier of the page itself, with "section" number;
2.  section heading text;
3.  subsection heading text; and
4.  the tag text of tagged paragraphs (`TP`).

Item #1 has already been done for several months and works fine; it can
be observed in any "groff-man-pages.pdf" document built from Git.
Cross-references between man(7) and mdoc(7) are supported.

There are a few remaining problems to be solved.

A.  Generation of _unique_ hyperlink tags from #2-#4 above.  There will
    be collisions galore under item 2 when multiple man pages are
    rendered.  A page can conceivably collide with itself with respect
    to items #3 and #4.  So we probably want a hierarchical
    tag representation: page-name/section/subsection/tag-item, where
    this structure is truncatable at any point after the first slash but
    is otherwise invariant.

B.  We need a predictable means of generating hyperlink tag identifiers
    that is also flexible enough to accommodate non-English languages
    and weird characters that people might populate their (sub)section
    titles or paragraph tags with.

    This requirement exacerbates a painful limitation in groff 1.23 and
    earlier.  It just wasn't going to happen without a change to the
    GNU troff output language specification that permitted non-ASCII
    code points _in parameters to device extension commands_ to be
    expressed.

    The good news is, that's sorted out now, and comes with a "NEWS"
    item.[1]  Deri was really helpful in sorting out the issues here.
    (As you're aware, there are knock-on issues not yet resolved to his
    satisfaction.[2])

    For those feel their the burn scars sizzling afresh, this is the
    root cause of the problem behind groff's most notorious diagnostics,
    because it applies just as much to output-format-specific document
    metadata.

    error("can't transparently output node at top level");

    error("can't translate %1 to special character '%2'"
          " in transparent throughput",
          input_char_description(cc),
          ci->nm.contents());

    The first happens when you get up to tomfoolery like this:

    .ds AUTHOR Frank \uand\d Estelle Costanza\"
    .pdfinfo \Author \*[AUTHOR]

    And the second when you commit the outrage of having a non-Basic
    Latin character in your name.

    .ds AUTHOR Luis Buñuel\"
    .pdfinfo \Author \*[AUTHOR]

    The exact same problems apply to document tags/anchors, and for
    exactly the same reason.  We didn't have a specification for
    encoding such things in device extension commands, also known as "x
    X" in "grout".  See groff_out(5).

    Okay, I have to go off on a rant here.[3]

C.  We then need a way to make references to these anchors/tags.  For
    man(7) the `MR` macro new to groff 1.23 was an obvious site to add
    the appropriate machinery for document-level links.  mdoc(7)'s `Xr`
    is closely analogous and has existed for many years.  In the
    forthcoming groff 1.24 (and in Git right now), they automatically
    supply hyperlink information for output devices that support such.
    (Just PDF and terminals.)

    But there remain two gaps.

    i.  No way to hyperlink in a more fine-grained way, that is to
        (sub)section headings or, conceivably, to paragraph tags.  This
        is a tougher problem because if these are not unique within a
        page, the location making the link has to know about the
        structure of the document.  Possibly, we'll just punt on the
        issue of "deep" cross-document links.

        mdoc(7) doesn't bother to support that; its `Sx` macro doesn't
        contemplate pointing into another document.[4] I notice that it,
        too, doesn't address the problem of duplicate heading names and
        therefore ambiguous references.  Because mdoc(7) culture is
        rigidly prescriptive, its section headings are tightly
        controlled, and I expect that this problem only threatens when
        subsections are used (and referenced).

    ii. Hyperlinking macros need to be added to ms(7), me(7), and mm(7).
        Here, at least with mm, the problems of within-document linking
        may be solvable with less disruption (meaning: no new macros),
        because the package already supports an internal referencing
        system.  Also, there is likely much less demand for deep links
        across documents using these packages.

        If someone's wondering, I'm not a fan of groff_www(7) and don't
        anticipate using it.

As I understand mandoc(1)'s less(1)-integrated tagging feature, none of
the problems above are mitigated by feeding the pager an auxiliary tags
file (less(1)'s `-T` option).  They have to be solved regardless.
Steffen Nurpesmo has campaigned repeatedly for extension of OSC 8
hyperlink syntax (or maybe just its semantics) to support anchor
placement in addition to linking.  I'm dubious of that suggestion.  OSC
8 wasn't developed with that in mind and had enough of a hill to climb.

Let's see, is that everything?  When I'm brain-dumping, sometimes it's
hard to tell whether I'm finished.  An affliction of age, maybe...

Regards,
Branden

[1] NEWS:

*  GNU troff now performs some limited processing/transformation of the
   argument to the `\X` escape sequence and its counterpart `device`
   request, to address the requirement that some documents have to pass
   metadata that must encode non-ASCII characters in device extension
   commands.  (For example, a document author may desire a document's
   section headings containing non-ASCII code points to appear correctly
   in PDF bookmarks.  Further, GNU troff encodes its output page
   description language only in ASCII.)  This change is expected to be
   of significance mainly to developers of output drivers for groff;
   groff_diff(7) describes the transformations.  If you have been using
   `\X` or `.device` to pass ASCII data to the output driver as a device
   extension command and require that it remain precisely as-is, use the
   `\!` escape sequence or `output` request, and prefix your data with
   "x X ", the device-independent troff means of expressing a device
   extension command (see groff_out(5)).

[2] https://lists.gnu.org/archive/html/groff/2024-12/msg00168.html

[3] "Transparent", along with "special" are my least favorite words in
    *roff nomenclature, and practically all of the blame can be laid at
    the Bell Labs CSRC in the 1970s.

    The Thompson-style naming convention of never using more than two
    letters to name anything except when compelled at gunpoint had the
    advantage that no one expected such identifiers to mean anything at
    all.  It's Unix, man.  You are not expected to understand it.

    (And when staring at the muzzle of a firearm, you can comply.  Just
    add the four letters "flag".  All done!)

[4] groff_mdoc(7):

(Sub)section cross references
     Use the ‘.Sx’ macro to cite a (sub)section heading within the given
     document.

           Usage: .Sx ⟨section‐reference⟩ ...

                    .Sx Files  → “Files”

     The default width is 16n.

Attachment: signature.asc
Description: PGP signature

Reply via email to