Re: ripgrep author seems happy with groff_man_style(7)

G. Branden Robinson Tue, 21 Jan 2025 11:19:44 -0800

Hi onf,

At 2025-01-21T16:05:18+0100, onf wrote:
> On Tue Jan 21, 2025 at 6:59 AM CET, G. Branden Robinson wrote:
> > At 2025-01-20T01:48:19+0100, onf wrote:
> > > And it works quite nicely, actually. The definitions are generated
> > > automatically, so all manpages written in mdoc benefit from it.
> > > I assume groff mdoc + man-db doesn't implement this?
> >
> > I'm working on it.
> 
> [rearranging]
> 
> > There are a few remaining problems to be solved.
> >
> > A.  Generation of _unique_ hyperlink tags from #2-#4 above.  There
> >     will be collisions galore under item 2 when multiple man pages
> >     are rendered.  A page can conceivably collide with itself with
> >     respect to items #3 and #4.  So we probably want a hierarchical
> >     tag representation: page-name/section/subsection/tag-item, where
> >     this structure is truncatable at any point after the first slash
> >     but is otherwise invariant.
> >
> > B.  We need a predictable means of generating hyperlink tag
> >     identifiers that is also flexible enough to accommodate
> >     non-English languages and weird characters that people might
> >     populate their (sub)section titles or paragraph tags with. [...]
> 
> You seem to be talking about HTML or PDF links.


Not exactly; I'm talking about how to express an abstraction of linking
in the broad sense of hypertext, where one has anchors or destination
marks (possibly invisible ones) in a text, and a means of expressing
references to those marks that can be visited with specific user intent
but little effort in the document presentation system.

> As a matter of fact, the only time I turn to HTML manpages is when I
> don't have one locally, and the only time I turn to a PDF one is when
> I want to print it, so PDF links in particular have little value to
> me. That's just my experience, though.
> 
> What I was talking about were less(1) tags, which are much more useful
> than HTML or PDF links,

Same thing.  Maybe I didn't express myself clearly.  The "tags file"
concept comes (albeit perhaps not in an ultimate sense) from a classic
Unix program called ctags(1), which is still around and with which vi
and emacs still integrate.  In ctags, the "marks" (anchors, whatever) of
hypertext are not intrinsic to the document, but externally annotated
via line number and regular expression.  But they achieve the same goal.

> because they significantly ease navigation of manpages in the
> terminal, which is THE way manpages are read and should thus be the
> primary focus.

I agree with this; the man page experience on the terminal is the
dominant usage, and by a significant margin, as best we can tell without
rigorous measurement.

> You don't seem to mention plans to support them.

Okay.  I'll be more explicit.  Once we've figured out an automatic
naming/tagging nomenclature, I want to implement a means to run the
formatter in a mode that will dump all of the tags in a document--which
need not require anything but changes to "an.tmac"--in a format useful
for another tool to consume.  That tool will probably be less(1).  I
don't know what tag format it uses; even if it's ctags(1), that
shouldn't be a problem, because at the time the document is formatted,
it knows what the current output line number is.  This feature might be
dependent on a long-contemplated refactoring of continuous rendering
mode to stop using the technique it currently does, and instead adopt an
explicit page length of, metaphorically, INT_MAX.  More precisely, the
value of the `.R` register in the forthcoming groff 1.24.0 release.
That value is already implemented and documented in "NEWS"...

Er, no it's not.  Huh.  I forgot the NEWS part.  Well, it will be.
Several facilities have already been migrated to use it.  Find the
ChangeLog entries in a footnote.[1]

> Don't get me wrong, it's nice that one can link to a specific
> (sub)section of an HTML manpage, but that's completely missing the
> point of the feature I was actually talking about, which is the
> ability to jump to the definition of a term in the manpage.

https://knowyourmeme.com/memes/theyre-the-same-picture

> For instance, when I want to see the description of register .j in
> groff(7), I have to do:
>   /\\n\[\.j
> to locate it. If it was written in mdoc, I could simply do:
>   :t .j
> 
> Besides being shorter, I wouldn't know to search for \\n\[\.j had I not
> already known how groff(7) is written, whereas I would know to search
> for .j.

I agree.  Hence the value of automatic tagging.  There is the problem
that we don't want the human man page reader to have to key in a fully
qualified tag name all the time.  All they need is an unambiguous match.
I don't know what less(1)'s facilities are for supporting this.  I also
don't happen to know how ctags(1) got extended to support C++ name
spaces and other means of qualifying colliding identifier names.  But if
ctags (perhaps Exuberant ctags, given the original ctags format's
advanced age) got extended to cope with that problem, presumably less(1)
learned how to interpret the extension.

Also, presumably mandoc(1) has solved the problem when it renders
multiple pages and more than one supports, say, an `-h` option.

Maybe Ingo (or a mandoc(1) power user) would like me to save me the
trouble of researching these points?

> I agree it would be nice if one could link to subsections and, more
> importantly, terms within other manpages. As a matter of fact though,
> man(7) can't even tag terms within the same page.

It's the same problem with the same solution, as I conceive it.  A
recurring theme in my contributions to groff's man(7)/mdoc(7) rendering
has been to solve problems when rendering N pages at a time, where N can
be 1 but might be greater.

> I've spent some time writing mdoc(7) lately while working on a
> reference for neatroff, and I guess I just don't get why anyone is
> using man(7) anymore.

I have some reasons.  I started this thread to point out someone saying
something nice about groff's man(7) documentation, and it predictably
turned into a locus for the airing of grievances about the macro
language.

Welcome to groff news--we report, you decide!

> I'm not saying mdoc is perfect; it certainly doesn't afford me
> the level of control I am used to from writing plain *roff, but it
> pays off in the language's descriptiveness, relative ease of use,

I think (1) its macro lexicon is too large; (2) its DWIM-ish
interpretation of isolated punctuation arguments, combined with its
unique in-package macro interpreter, make it a poor base camp from which
to make further forays into *roff document formatting; and (3) its
community has too many Kool-Aid drinkers.  You're unusual in that you
led a paragraph with an acknowledgement of its imperfection.

With man(7), that concession can be taken as read.  :)  Nobody's madly
in love with it, but one can learn--I think without much difficulty--how
to get things done.

> and especially the terms/tags, which are incredibly practical.

About that I have nothing bad to say (though I haven't actually
_exercised_ this feature aggressively).  It's a good feature and that's
why I want to recapitulate it in man(7).

> I can imagine the language being much more approachable for people
> lacking knowledge of *roff, too.

Yes, they can acquire a sort of anti-knowledge they'll have to forget if
they ever use any other *roff macro package.  Ah well.  That ship sailed
35 years ago.

[humongous digression follows]

It's important to understand the main reason (I surmise) mdoc was born.
It's because AT&T/USL was getting increasingly litigious and nasty about
the Unix copyrights.  That included the "tmac.an" file.  The Berkeley
CSRG stopped contributing changes to the ms package, for instance, after
4.2BSD (1983), a full seven years before mdoc(7) appeared in
4.3BSD-Reno.

mdoc(7) was born out of anger.  Not really at the difficulty of writing
good man pages in man(7), but at AT&T for being dicks (which they were).
The CSRG enlisted people, possibly on a paid basis (I'm not sure) to
design and implement a replacement for man(7), and, I suspect, to do so
in such a way that the language could be reimplemented atop something
that wasn't a *roff at all, because AT&T owned that too, hence the
bespoke macro processing system.  That did of course eventually happen.
Much, much later I think than anyone at CSRG contemplated.

The irony is that GNU, of which the BSD community was not yet a sworn
enemy, was developing a wholesale Free Software replacement for troff at
just about the same time.  (BSD people were not alone in recognizing
AT&T's dickish nature.)

So by the time BSD proudly introduced its all-singing, all-dancing
second-generation man page formatting package, GNU troff had already
come along and not only reimplemented man(7) under a Free Software
license, but the troff formatter, the preprocessors that some man pages
used (tbl(1) foremost), and output drivers for PostScript and terminals.
(This was 1990.  PDF and HTML didn't exist yet.)

The situation with AT&T/USL got progressively worse, and meanwhile
several CSRG luminaries founded BSDI.  Nobody's come right out and said
this, but I think BSDI's objective was to repeat the staggering fiscal
success Sun Microsystems had enjoyed in taking BSD code as a foundation,
adding proprietary secret sauce to it, and raking in money hand over
fist by extracting the monopoly rents inherent in copyright.

In yet another irony, at the same time Sun was serving as a shining
beacon for every Unix nerd in Silicon Valley to emulate, it found itself
insolvent and sold a 20% stake in itself, along with its *SOUL* (just
ask Larry McVoy), to AT&T, with part of the sale price being that Sun
would abandon, forswear, and forever repudiate further development of
anything BSD-related.  More or less.  But I'm not crystal clear on the
timeline of BSDI's founding relative to Sun's sale of a stake to
AT&T--it could be that BSDI's principals saw themselves as taking up the
market segment that Sun found itself commanded to abandon.  If that's
the case, I would certainly not say that was a stupid idea.  Plenty of
other companies bought themselves a few more years of life by Oppos[ing]
Sun Forever, hence "OSF".  Then Microsoft came along with Windows 95 and
NT 4, calmly devouring--with an assist from not a few of its tried-and-
true trade-restraining, monopolistic strategies--about 250% of the
market every commercial Unix vendor already had or was looking to break
into.  This is why it is said that the winner of the Unix wars was
Microsoft, despite it lacking a Unix product to offer.

With BSDI around to soak up a lot of CSRG talent, a whopping case of
(justified) butthurt over mistreatment by AT&T USL, and--so my psychic
powers reveal--founders' visions (enthusiastically adopted by every
undergraduate EE in America) of retiring at age 40 with multi-
millionaire status and faces on the occasional cover of _Fortune_
magazine, the BSD community rapidly acquired religion in a way that it
had lacked previously.  All of a sudden, the copylefted GNU stuff that
BSD had happily, even excitedly, been absorbing as a means of achieving
independence from AT&T copyrights, became anathema.  GNU programs had
made inroads to BSD because they were (1) free of charge, (2) free to
hack on, and (3) tended to be actively developed with concern for their
quality, something that was never true of AT&T Unix programs absent a
specific case bolstering a "strategic" executive position.  The point
was to extract rent, remember?  Paying for development activity eats
into your net profits.  Charge the customer the highest price he can
tolerate while feeding him the largest diet of excrement he can
withstand.  Anything else you can buy isn't Unix, and if it's close
enough, we'll sue its vendor into ruin.  And USL had a court case to
point to, proving they weren't bluffing.

And that, as I understand it, is the origin of copyleft cooties.

Having been adopted, groff was slated to be thrown out again.  Except
that didn't come to pass, not for over a decade, because another
alternative to Unix troff took a long time to eventuate.  When it did,
some of the theologians of the now-fragmented BSD community decided that
copyleft per se was actually not heretical after all.[2]  Just the GPL
was.  The CDDL was fine.  Because that came from Sun.  And Sun was
great.  Even though it was the same Sun that so horribly betrayed them
by defecting to AT&T in the late 1980s.  At least Sun leadership still
knew something about getting rich.  And in America and above all in
Silicon Valley, the correlation coefficient of wealth and genius is
universally held to be 1.0.

So as with GCC, groff ended up sticking around, with its abhorrent
license making the BSD hacker's mind more sullen and embittered with
every passing year.  Also, the damnable Linux kernel stole all the glory
and more that should have redounded to the BSD kernel.  It was no
consolation at all that Linux kernel hackers were, in large proportion,
openly contemptuous of the FSF and the GNU Project.  The Linux
community's broad (albeit not exclusive) adoption of the hated GPL was a
further slap in the face.  I wonder if it adds insult to injury that the
Linux kernel came under the effective administration of the Linux
Foundation, an industry trade group (much like OSF) that insists equally
on retaining the GPL as a license but also refusing to ever enforce it.
"Why the hell not just put it under the BSD license?", they must ask.
I've asked the same thing.  The answer appears to be that the LF has a
good thing going and they're not about to screw it up.[3]  Money gets
paid and the important decisions get made at the proper levels, by MBAs
from Wharton and HBS.

It took about twenty years for that *roff-independent man page formatter
to eventuate, and Ingo has had to make a lot of compromises--meaning,
implement a lot more *roff features--than I think he or original author
Kristaps Dzonsons ever imagined.  I've corresponded with both of them
(the latter only about his "lowdown" project); they can correct me if
I'm wrong.

To sum up, I think mdoc(7) is a good cautionary tale illustrating the
sunk cost fallacy.  A lot of technical decisions in BSD in the early
1990s were made on bases unrelated to technical merit, but essentially
political struggles and, where political motivations were impolitic to
admit, religious principles arose to shield them from critique.  Anger,
greed, and resentment mixed into a toxic stew that distorted judgment,
pushing a lot of engineering concerns aside.  I don't claim that mdoc(7)
wasn't conscientiously designed--in fact, I find its consistency
admirable.  What I take issue with is that its charter was (or became)
too broad, and that charter had the size it did, or so I intuit, for
reasons unrelated to engineering objectives or the production of
high-quality documentation.  Rewriting every man page essentially from
scratch was an anti-goal that could only have been permitted to survive
due to non-technical factors.

(Maybe what this case illustrates even better is that one can recognize
the sunk cost fallacy in one place only to be completely blind to it in
another.)

How we got here, why mdoc looks the way it does in some respects--much
of that is not a happy story.  It would have been better for everyone,
in my opinion, for the BSDs to say, "hey, there's a basically free *roff
implementation that we _already distribute_ that has a man(7)
implementation.  AT&T/USL will not be able to take that away from us.
Let's make man(7) better, like Sun did.[4]"  But, that's an alternate
timeline.  And, given a choice between one where they went that
direction and one where Bernie Sanders won the U.S. presidency in 2016,
I'd take the latter.  This list wouldn't get to enjoy hearing me
complain less, but my friends would.  😅

Now then--all of that said, I have no particular beef with the mdoc(7)
language or the mandoc(1) formatter.  They are here, they exist, and,
quite fortunately, Ingo is someone I can work easily with and who seems
to reason about engineering decisions much in the same way I do.  So I
aim to continue maintaining groff's mdoc(7) implementation as well as I
am able--which, as with every other aspect of groff--may fall short of
adequacy in the eyes of some.  There's a lot of work to do and limited
time, notably once one subtracts that spent composing emails like this.

But no one put a gun to my head and made me write it.

> >         Because mdoc(7) culture is rigidly prescriptive, its section
> >         headings are tightly controlled, and I expect that this
> >         problem only threatens when subsections are used (and
> >         referenced).
> 
> Although mdoc(7) says something to the effect of:
>   For a list of conventional manual sections, see MANUAL STRUCTURE.
>   These sections should be used unless it's absolutely necessary
>   that custom sections be used.
> 
> ...in reality it itself uses non-standard section headings:
>   Name
>   Description
>   Manual Structure
>   Macro Overview
>   Macro Reference
>   Macro Syntax
>   Compatibility
>   See Also
>   History
>   Authors

Yes, some--not all--of those are unconventional.  I wouldn't say "not
standard" because we have no standard to which to point.  Just
conventions, some of which have been codified in style guides.

> I think the point is more about sticking to conventional section names
> if possible than about forbidding non-standard ones.

I think I have seen Ingo do the latter, but I could be mistaken.

> > As I understand mandoc(1)'s less(1)-integrated tagging feature, none
> > of the problems above are mitigated by feeding the pager an
> > auxiliary tags file (less(1)'s `-T` option). [...]
> 
> The tags file allows multiple tags with the same name, which can then
> be navigated using the t (next tag) and T (previous tag) commands.

This observation seems to tie in with my point about ambiguous tags,
noted (far) above.  There's nothing for it but for me to learn how the
sausage is made, and see how groff man(7) might produce a vegetable-
based substitute that only the most hypercarnivorous reader will detect.

Regards,
Branden

[1] ChangeLog (entries arranged in forward chronological order):

2024-08-17  G. Branden Robinson <g.branden.robin...@gmail.com>

        * src/roff/troff/input.cpp: Support construction of read-only
        registers from integers.
        (class readonly_text_register): Declare constructor taking `int`
        argument.
        (readonly_text_register::readonly_text_register): Add it.
        (main): Use it to initialize `.T` register.
        (init_registers): Use it to initialize `.A` register.
        (init_input_requests): Use it to initialize `.g` and `.R`
        registers.

2024-08-17  G. Branden Robinson <g.branden.robin...@gmail.com>

        * src/roff/troff/input.cpp (init_input_requests): Initialize
        `.R` read-only register to `INT_MAX` instead of 10000.

        * doc/groff.texi.in (Built-in Registers):
        * man/groff.7.man (Read-only registers):
        * man/groff_diff.7.man (Altered registers): Document it.

        * doc/groff.texi.in (Manipulating Filling and Adjustment)
        (End-of-input Traps): Apply it to examples.

        Fixes <https://savannah.gnu.org/bugs/?63587>.

2024-08-17  G. Branden Robinson <g.branden.robin...@gmail.com>

        * tmac/e.tmac (@M, $c): If the formatter is GNU troff,
        use value of its `.R` register instead of "10000" to indicate an
        arbitrary large integer.
        (EQ): Same, but in commented form; a formatter DoS attack is
        otherwise revealed.
        (TS): Similar; set line length to `.R` basic units minus 1n to
        avoid saturation warnings on output devices with a non-unit
        horizontal motion quantum.
        * tmac/html-end.tmac: Set page length to `.R` basic units minus
        1v to avoid saturation warnings on output devices with a
        non-unit vertical motion quantum (as the "html" device has).
        * tmac/man.ultrix (HB): Use value of `.R` register instead of
        "999" to indicate an arbitrary large integer.
        * tmac/psfig.tmac (F+):
        * tmac/s.tmac (cov*tl-au-print, ID, par@TL, par@AU, par@AI): Use
        value of `.R` register instead of "9999" to indicate an
        arbitrary large integer.

2024-10-04  G. Branden Robinson <g.branden.robin...@gmail.com>

        * src/preproc/tbl/table.cpp (table::init_output): Migrate to use
        of `.R` register for a huge value in generated groff language,
        instead of hard-coding (2^31)-1 as a numeric literal.

contrib/mm/ChangeLog:

2024-09-15  G. Branden Robinson <g.branden.robin...@gmail.com>

        * m.tmac: Use value of GNU troff `.R` register instead of "99"
        {an "arbitrary large number"} as `Ls` register default.

2024-08-17  G. Branden Robinson <g.branden.robin...@gmail.com>

        * m.tmac (ds@output-float, ds@end): Use value of GNU troff `.R`
        register instead of "9999" to indicate an arbitrary large
        integer.

[2] I credit OpenBSD with being perhaps the only BSD faction that seems
    to have articulated consistent principles about the copyright
    licenses they'll tolerate, and hewed to them in practice.  They're
    as allergic to copyleft as anyone, but, as far as I've observed,
    their allergy is not selective with respect to license vendor or
    promulgator.  They reject a copyleft from IBM (EPL), Sun (CDDL), or
    Marc Andreesen (MPL) with equal vigor.  I respect that.  But
    unprincipled exceptions might exist; I haven't done a license audit
    of their distro.  I used to get paid for that sort of thing, but
    it's not something I'd undertake in my free time just for fun.

[3] https://www.linuxfoundation.org/about/members

    Check out that L-curve.

[4] Sun did _not_ make man(7) better.  But I feel certain that's what
    the average BSD hacker in the street would have said in 1990.

signature.asc
Description: PGP signature

Re: ripgrep author seems happy with groff_man_style(7)

Reply via email to