Hi onf, At 2025-01-20T01:48:19+0100, onf wrote: > Actually, BSD mandoc does implement this, it's just documented at > a poorly visible place in the docs. BSD mandoc's man(1): > MANPAGER > Any non-empty value of the environment variable MANPAGER is > used instead of the standard pagination program, less(1). If > less(1) is used, the interactive :t command can be used to go > to the definitions of various terms, for example command line > options, command modifiers, internal commands, environment > variables, function names, preprocessor macros, errno(2) > values, and some other emphasized words. Some terms may have > defining text at more than one place. In that case, the > less(1) interactive commands t and T can be used to move to the > next and to the previous place providing information about the > term last searched for with :t. The -O tag[=term] option > documented in the mandoc(1) manual opens a manual page at the > definition of a specific term rather than at the beginning. > > And it works quite nicely, actually. The definitions are generated > automatically, so all manpages written in mdoc benefit from it. > I assume groff mdoc + man-db doesn't implement this?
I'm working on it.
[requoting]
> The definitions are generated automatically
That's the rub. We need a design for automatic construction of
tag/anchor names from the user-specified names of the items to be
tagged. In man(7) documents, those taggable items are probably going to
be:
1. the identifier of the page itself, with "section" number;
2. section heading text;
3. subsection heading text; and
4. the tag text of tagged paragraphs (`TP`).
Item #1 has already been done for several months and works fine; it can
be observed in any "groff-man-pages.pdf" document built from Git.
Cross-references between man(7) and mdoc(7) are supported.
There are a few remaining problems to be solved.
A. Generation of _unique_ hyperlink tags from #2-#4 above. There will
be collisions galore under item 2 when multiple man pages are
rendered. A page can conceivably collide with itself with respect
to items #3 and #4. So we probably want a hierarchical
tag representation: page-name/section/subsection/tag-item, where
this structure is truncatable at any point after the first slash but
is otherwise invariant.
B. We need a predictable means of generating hyperlink tag identifiers
that is also flexible enough to accommodate non-English languages
and weird characters that people might populate their (sub)section
titles or paragraph tags with.
This requirement exacerbates a painful limitation in groff 1.23 and
earlier. It just wasn't going to happen without a change to the
GNU troff output language specification that permitted non-ASCII
code points _in parameters to device extension commands_ to be
expressed.
The good news is, that's sorted out now, and comes with a "NEWS"
item.[1] Deri was really helpful in sorting out the issues here.
(As you're aware, there are knock-on issues not yet resolved to his
satisfaction.[2])
For those feel their the burn scars sizzling afresh, this is the
root cause of the problem behind groff's most notorious diagnostics,
because it applies just as much to output-format-specific document
metadata.
error("can't transparently output node at top level");
error("can't translate %1 to special character '%2'"
" in transparent throughput",
input_char_description(cc),
ci->nm.contents());
The first happens when you get up to tomfoolery like this:
.ds AUTHOR Frank \uand\d Estelle Costanza\"
.pdfinfo \Author \*[AUTHOR]
And the second when you commit the outrage of having a non-Basic
Latin character in your name.
.ds AUTHOR Luis Buñuel\"
.pdfinfo \Author \*[AUTHOR]
The exact same problems apply to document tags/anchors, and for
exactly the same reason. We didn't have a specification for
encoding such things in device extension commands, also known as "x
X" in "grout". See groff_out(5).
Okay, I have to go off on a rant here.[3]
C. We then need a way to make references to these anchors/tags. For
man(7) the `MR` macro new to groff 1.23 was an obvious site to add
the appropriate machinery for document-level links. mdoc(7)'s `Xr`
is closely analogous and has existed for many years. In the
forthcoming groff 1.24 (and in Git right now), they automatically
supply hyperlink information for output devices that support such.
(Just PDF and terminals.)
But there remain two gaps.
i. No way to hyperlink in a more fine-grained way, that is to
(sub)section headings or, conceivably, to paragraph tags. This
is a tougher problem because if these are not unique within a
page, the location making the link has to know about the
structure of the document. Possibly, we'll just punt on the
issue of "deep" cross-document links.
mdoc(7) doesn't bother to support that; its `Sx` macro doesn't
contemplate pointing into another document.[4] I notice that it,
too, doesn't address the problem of duplicate heading names and
therefore ambiguous references. Because mdoc(7) culture is
rigidly prescriptive, its section headings are tightly
controlled, and I expect that this problem only threatens when
subsections are used (and referenced).
ii. Hyperlinking macros need to be added to ms(7), me(7), and mm(7).
Here, at least with mm, the problems of within-document linking
may be solvable with less disruption (meaning: no new macros),
because the package already supports an internal referencing
system. Also, there is likely much less demand for deep links
across documents using these packages.
If someone's wondering, I'm not a fan of groff_www(7) and don't
anticipate using it.
As I understand mandoc(1)'s less(1)-integrated tagging feature, none of
the problems above are mitigated by feeding the pager an auxiliary tags
file (less(1)'s `-T` option). They have to be solved regardless.
Steffen Nurpesmo has campaigned repeatedly for extension of OSC 8
hyperlink syntax (or maybe just its semantics) to support anchor
placement in addition to linking. I'm dubious of that suggestion. OSC
8 wasn't developed with that in mind and had enough of a hill to climb.
Let's see, is that everything? When I'm brain-dumping, sometimes it's
hard to tell whether I'm finished. An affliction of age, maybe...
Regards,
Branden
[1] NEWS:
* GNU troff now performs some limited processing/transformation of the
argument to the `\X` escape sequence and its counterpart `device`
request, to address the requirement that some documents have to pass
metadata that must encode non-ASCII characters in device extension
commands. (For example, a document author may desire a document's
section headings containing non-ASCII code points to appear correctly
in PDF bookmarks. Further, GNU troff encodes its output page
description language only in ASCII.) This change is expected to be
of significance mainly to developers of output drivers for groff;
groff_diff(7) describes the transformations. If you have been using
`\X` or `.device` to pass ASCII data to the output driver as a device
extension command and require that it remain precisely as-is, use the
`\!` escape sequence or `output` request, and prefix your data with
"x X ", the device-independent troff means of expressing a device
extension command (see groff_out(5)).
[2] https://lists.gnu.org/archive/html/groff/2024-12/msg00168.html
[3] "Transparent", along with "special" are my least favorite words in
*roff nomenclature, and practically all of the blame can be laid at
the Bell Labs CSRC in the 1970s.
The Thompson-style naming convention of never using more than two
letters to name anything except when compelled at gunpoint had the
advantage that no one expected such identifiers to mean anything at
all. It's Unix, man. You are not expected to understand it.
(And when staring at the muzzle of a firearm, you can comply. Just
add the four letters "flag". All done!)
[4] groff_mdoc(7):
(Sub)section cross references
Use the ‘.Sx’ macro to cite a (sub)section heading within the given
document.
Usage: .Sx ⟨section‐reference⟩ ...
.Sx Files → “Files”
The default width is 16n.
signature.asc
Description: PGP signature
