Hi onf, At 2025-01-21T16:05:18+0100, onf wrote: > On Tue Jan 21, 2025 at 6:59 AM CET, G. Branden Robinson wrote: > > At 2025-01-20T01:48:19+0100, onf wrote: > > > And it works quite nicely, actually. The definitions are generated > > > automatically, so all manpages written in mdoc benefit from it. > > > I assume groff mdoc + man-db doesn't implement this? > > > > I'm working on it. > > [rearranging] > > > There are a few remaining problems to be solved. > > > > A. Generation of _unique_ hyperlink tags from #2-#4 above. There > > will be collisions galore under item 2 when multiple man pages > > are rendered. A page can conceivably collide with itself with > > respect to items #3 and #4. So we probably want a hierarchical > > tag representation: page-name/section/subsection/tag-item, where > > this structure is truncatable at any point after the first slash > > but is otherwise invariant. > > > > B. We need a predictable means of generating hyperlink tag > > identifiers that is also flexible enough to accommodate > > non-English languages and weird characters that people might > > populate their (sub)section titles or paragraph tags with. [...] > > You seem to be talking about HTML or PDF links.
Not exactly; I'm talking about how to express an abstraction of linking in the broad sense of hypertext, where one has anchors or destination marks (possibly invisible ones) in a text, and a means of expressing references to those marks that can be visited with specific user intent but little effort in the document presentation system. > As a matter of fact, the only time I turn to HTML manpages is when I > don't have one locally, and the only time I turn to a PDF one is when > I want to print it, so PDF links in particular have little value to > me. That's just my experience, though. > > What I was talking about were less(1) tags, which are much more useful > than HTML or PDF links, Same thing. Maybe I didn't express myself clearly. The "tags file" concept comes (albeit perhaps not in an ultimate sense) from a classic Unix program called ctags(1), which is still around and with which vi and emacs still integrate. In ctags, the "marks" (anchors, whatever) of hypertext are not intrinsic to the document, but externally annotated via line number and regular expression. But they achieve the same goal. > because they significantly ease navigation of manpages in the > terminal, which is THE way manpages are read and should thus be the > primary focus. I agree with this; the man page experience on the terminal is the dominant usage, and by a significant margin, as best we can tell without rigorous measurement. > You don't seem to mention plans to support them. Okay. I'll be more explicit. Once we've figured out an automatic naming/tagging nomenclature, I want to implement a means to run the formatter in a mode that will dump all of the tags in a document--which need not require anything but changes to "an.tmac"--in a format useful for another tool to consume. That tool will probably be less(1). I don't know what tag format it uses; even if it's ctags(1), that shouldn't be a problem, because at the time the document is formatted, it knows what the current output line number is. This feature might be dependent on a long-contemplated refactoring of continuous rendering mode to stop using the technique it currently does, and instead adopt an explicit page length of, metaphorically, INT_MAX. More precisely, the value of the `.R` register in the forthcoming groff 1.24.0 release. That value is already implemented and documented in "NEWS"... Er, no it's not. Huh. I forgot the NEWS part. Well, it will be. Several facilities have already been migrated to use it. Find the ChangeLog entries in a footnote.[1] > Don't get me wrong, it's nice that one can link to a specific > (sub)section of an HTML manpage, but that's completely missing the > point of the feature I was actually talking about, which is the > ability to jump to the definition of a term in the manpage. https://knowyourmeme.com/memes/theyre-the-same-picture > For instance, when I want to see the description of register .j in > groff(7), I have to do: > /\\n\[\.j > to locate it. If it was written in mdoc, I could simply do: > :t .j > > Besides being shorter, I wouldn't know to search for \\n\[\.j had I not > already known how groff(7) is written, whereas I would know to search > for .j. I agree. Hence the value of automatic tagging. There is the problem that we don't want the human man page reader to have to key in a fully qualified tag name all the time. All they need is an unambiguous match. I don't know what less(1)'s facilities are for supporting this. I also don't happen to know how ctags(1) got extended to support C++ name spaces and other means of qualifying colliding identifier names. But if ctags (perhaps Exuberant ctags, given the original ctags format's advanced age) got extended to cope with that problem, presumably less(1) learned how to interpret the extension. Also, presumably mandoc(1) has solved the problem when it renders multiple pages and more than one supports, say, an `-h` option. Maybe Ingo (or a mandoc(1) power user) would like me to save me the trouble of researching these points? > I agree it would be nice if one could link to subsections and, more > importantly, terms within other manpages. As a matter of fact though, > man(7) can't even tag terms within the same page. It's the same problem with the same solution, as I conceive it. A recurring theme in my contributions to groff's man(7)/mdoc(7) rendering has been to solve problems when rendering N pages at a time, where N can be 1 but might be greater. > I've spent some time writing mdoc(7) lately while working on a > reference for neatroff, and I guess I just don't get why anyone is > using man(7) anymore. I have some reasons. I started this thread to point out someone saying something nice about groff's man(7) documentation, and it predictably turned into a locus for the airing of grievances about the macro language. Welcome to groff news--we report, you decide! > I'm not saying mdoc is perfect; it certainly doesn't afford me > the level of control I am used to from writing plain *roff, but it > pays off in the language's descriptiveness, relative ease of use, I think (1) its macro lexicon is too large; (2) its DWIM-ish interpretation of isolated punctuation arguments, combined with its unique in-package macro interpreter, make it a poor base camp from which to make further forays into *roff document formatting; and (3) its community has too many Kool-Aid drinkers. You're unusual in that you led a paragraph with an acknowledgement of its imperfection. With man(7), that concession can be taken as read. :) Nobody's madly in love with it, but one can learn--I think without much difficulty--how to get things done. > and especially the terms/tags, which are incredibly practical. About that I have nothing bad to say (though I haven't actually _exercised_ this feature aggressively). It's a good feature and that's why I want to recapitulate it in man(7). > I can imagine the language being much more approachable for people > lacking knowledge of *roff, too. Yes, they can acquire a sort of anti-knowledge they'll have to forget if they ever use any other *roff macro package. Ah well. That ship sailed 35 years ago. [humongous digression follows] It's important to understand the main reason (I surmise) mdoc was born. It's because AT&T/USL was getting increasingly litigious and nasty about the Unix copyrights. That included the "tmac.an" file. The Berkeley CSRG stopped contributing changes to the ms package, for instance, after 4.2BSD (1983), a full seven years before mdoc(7) appeared in 4.3BSD-Reno. mdoc(7) was born out of anger. Not really at the difficulty of writing good man pages in man(7), but at AT&T for being dicks (which they were). The CSRG enlisted people, possibly on a paid basis (I'm not sure) to design and implement a replacement for man(7), and, I suspect, to do so in such a way that the language could be reimplemented atop something that wasn't a *roff at all, because AT&T owned that too, hence the bespoke macro processing system. That did of course eventually happen. Much, much later I think than anyone at CSRG contemplated. The irony is that GNU, of which the BSD community was not yet a sworn enemy, was developing a wholesale Free Software replacement for troff at just about the same time. (BSD people were not alone in recognizing AT&T's dickish nature.) So by the time BSD proudly introduced its all-singing, all-dancing second-generation man page formatting package, GNU troff had already come along and not only reimplemented man(7) under a Free Software license, but the troff formatter, the preprocessors that some man pages used (tbl(1) foremost), and output drivers for PostScript and terminals. (This was 1990. PDF and HTML didn't exist yet.) The situation with AT&T/USL got progressively worse, and meanwhile several CSRG luminaries founded BSDI. Nobody's come right out and said this, but I think BSDI's objective was to repeat the staggering fiscal success Sun Microsystems had enjoyed in taking BSD code as a foundation, adding proprietary secret sauce to it, and raking in money hand over fist by extracting the monopoly rents inherent in copyright. In yet another irony, at the same time Sun was serving as a shining beacon for every Unix nerd in Silicon Valley to emulate, it found itself insolvent and sold a 20% stake in itself, along with its *SOUL* (just ask Larry McVoy), to AT&T, with part of the sale price being that Sun would abandon, forswear, and forever repudiate further development of anything BSD-related. More or less. But I'm not crystal clear on the timeline of BSDI's founding relative to Sun's sale of a stake to AT&T--it could be that BSDI's principals saw themselves as taking up the market segment that Sun found itself commanded to abandon. If that's the case, I would certainly not say that was a stupid idea. Plenty of other companies bought themselves a few more years of life by Oppos[ing] Sun Forever, hence "OSF". Then Microsoft came along with Windows 95 and NT 4, calmly devouring--with an assist from not a few of its tried-and- true trade-restraining, monopolistic strategies--about 250% of the market every commercial Unix vendor already had or was looking to break into. This is why it is said that the winner of the Unix wars was Microsoft, despite it lacking a Unix product to offer. With BSDI around to soak up a lot of CSRG talent, a whopping case of (justified) butthurt over mistreatment by AT&T USL, and--so my psychic powers reveal--founders' visions (enthusiastically adopted by every undergraduate EE in America) of retiring at age 40 with multi- millionaire status and faces on the occasional cover of _Fortune_ magazine, the BSD community rapidly acquired religion in a way that it had lacked previously. All of a sudden, the copylefted GNU stuff that BSD had happily, even excitedly, been absorbing as a means of achieving independence from AT&T copyrights, became anathema. GNU programs had made inroads to BSD because they were (1) free of charge, (2) free to hack on, and (3) tended to be actively developed with concern for their quality, something that was never true of AT&T Unix programs absent a specific case bolstering a "strategic" executive position. The point was to extract rent, remember? Paying for development activity eats into your net profits. Charge the customer the highest price he can tolerate while feeding him the largest diet of excrement he can withstand. Anything else you can buy isn't Unix, and if it's close enough, we'll sue its vendor into ruin. And USL had a court case to point to, proving they weren't bluffing. And that, as I understand it, is the origin of copyleft cooties. Having been adopted, groff was slated to be thrown out again. Except that didn't come to pass, not for over a decade, because another alternative to Unix troff took a long time to eventuate. When it did, some of the theologians of the now-fragmented BSD community decided that copyleft per se was actually not heretical after all.[2] Just the GPL was. The CDDL was fine. Because that came from Sun. And Sun was great. Even though it was the same Sun that so horribly betrayed them by defecting to AT&T in the late 1980s. At least Sun leadership still knew something about getting rich. And in America and above all in Silicon Valley, the correlation coefficient of wealth and genius is universally held to be 1.0. So as with GCC, groff ended up sticking around, with its abhorrent license making the BSD hacker's mind more sullen and embittered with every passing year. Also, the damnable Linux kernel stole all the glory and more that should have redounded to the BSD kernel. It was no consolation at all that Linux kernel hackers were, in large proportion, openly contemptuous of the FSF and the GNU Project. The Linux community's broad (albeit not exclusive) adoption of the hated GPL was a further slap in the face. I wonder if it adds insult to injury that the Linux kernel came under the effective administration of the Linux Foundation, an industry trade group (much like OSF) that insists equally on retaining the GPL as a license but also refusing to ever enforce it. "Why the hell not just put it under the BSD license?", they must ask. I've asked the same thing. The answer appears to be that the LF has a good thing going and they're not about to screw it up.[3] Money gets paid and the important decisions get made at the proper levels, by MBAs from Wharton and HBS. It took about twenty years for that *roff-independent man page formatter to eventuate, and Ingo has had to make a lot of compromises--meaning, implement a lot more *roff features--than I think he or original author Kristaps Dzonsons ever imagined. I've corresponded with both of them (the latter only about his "lowdown" project); they can correct me if I'm wrong. To sum up, I think mdoc(7) is a good cautionary tale illustrating the sunk cost fallacy. A lot of technical decisions in BSD in the early 1990s were made on bases unrelated to technical merit, but essentially political struggles and, where political motivations were impolitic to admit, religious principles arose to shield them from critique. Anger, greed, and resentment mixed into a toxic stew that distorted judgment, pushing a lot of engineering concerns aside. I don't claim that mdoc(7) wasn't conscientiously designed--in fact, I find its consistency admirable. What I take issue with is that its charter was (or became) too broad, and that charter had the size it did, or so I intuit, for reasons unrelated to engineering objectives or the production of high-quality documentation. Rewriting every man page essentially from scratch was an anti-goal that could only have been permitted to survive due to non-technical factors. (Maybe what this case illustrates even better is that one can recognize the sunk cost fallacy in one place only to be completely blind to it in another.) How we got here, why mdoc looks the way it does in some respects--much of that is not a happy story. It would have been better for everyone, in my opinion, for the BSDs to say, "hey, there's a basically free *roff implementation that we _already distribute_ that has a man(7) implementation. AT&T/USL will not be able to take that away from us. Let's make man(7) better, like Sun did.[4]" But, that's an alternate timeline. And, given a choice between one where they went that direction and one where Bernie Sanders won the U.S. presidency in 2016, I'd take the latter. This list wouldn't get to enjoy hearing me complain less, but my friends would. 😅 Now then--all of that said, I have no particular beef with the mdoc(7) language or the mandoc(1) formatter. They are here, they exist, and, quite fortunately, Ingo is someone I can work easily with and who seems to reason about engineering decisions much in the same way I do. So I aim to continue maintaining groff's mdoc(7) implementation as well as I am able--which, as with every other aspect of groff--may fall short of adequacy in the eyes of some. There's a lot of work to do and limited time, notably once one subtracts that spent composing emails like this. But no one put a gun to my head and made me write it. > > Because mdoc(7) culture is rigidly prescriptive, its section > > headings are tightly controlled, and I expect that this > > problem only threatens when subsections are used (and > > referenced). > > Although mdoc(7) says something to the effect of: > For a list of conventional manual sections, see MANUAL STRUCTURE. > These sections should be used unless it's absolutely necessary > that custom sections be used. > > ...in reality it itself uses non-standard section headings: > Name > Description > Manual Structure > Macro Overview > Macro Reference > Macro Syntax > Compatibility > See Also > History > Authors Yes, some--not all--of those are unconventional. I wouldn't say "not standard" because we have no standard to which to point. Just conventions, some of which have been codified in style guides. > I think the point is more about sticking to conventional section names > if possible than about forbidding non-standard ones. I think I have seen Ingo do the latter, but I could be mistaken. > > As I understand mandoc(1)'s less(1)-integrated tagging feature, none > > of the problems above are mitigated by feeding the pager an > > auxiliary tags file (less(1)'s `-T` option). [...] > > The tags file allows multiple tags with the same name, which can then > be navigated using the t (next tag) and T (previous tag) commands. This observation seems to tie in with my point about ambiguous tags, noted (far) above. There's nothing for it but for me to learn how the sausage is made, and see how groff man(7) might produce a vegetable- based substitute that only the most hypercarnivorous reader will detect. Regards, Branden [1] ChangeLog (entries arranged in forward chronological order): 2024-08-17 G. Branden Robinson <g.branden.robin...@gmail.com> * src/roff/troff/input.cpp: Support construction of read-only registers from integers. (class readonly_text_register): Declare constructor taking `int` argument. (readonly_text_register::readonly_text_register): Add it. (main): Use it to initialize `.T` register. (init_registers): Use it to initialize `.A` register. (init_input_requests): Use it to initialize `.g` and `.R` registers. 2024-08-17 G. Branden Robinson <g.branden.robin...@gmail.com> * src/roff/troff/input.cpp (init_input_requests): Initialize `.R` read-only register to `INT_MAX` instead of 10000. * doc/groff.texi.in (Built-in Registers): * man/groff.7.man (Read-only registers): * man/groff_diff.7.man (Altered registers): Document it. * doc/groff.texi.in (Manipulating Filling and Adjustment) (End-of-input Traps): Apply it to examples. Fixes <https://savannah.gnu.org/bugs/?63587>. 2024-08-17 G. Branden Robinson <g.branden.robin...@gmail.com> * tmac/e.tmac (@M, $c): If the formatter is GNU troff, use value of its `.R` register instead of "10000" to indicate an arbitrary large integer. (EQ): Same, but in commented form; a formatter DoS attack is otherwise revealed. (TS): Similar; set line length to `.R` basic units minus 1n to avoid saturation warnings on output devices with a non-unit horizontal motion quantum. * tmac/html-end.tmac: Set page length to `.R` basic units minus 1v to avoid saturation warnings on output devices with a non-unit vertical motion quantum (as the "html" device has). * tmac/man.ultrix (HB): Use value of `.R` register instead of "999" to indicate an arbitrary large integer. * tmac/psfig.tmac (F+): * tmac/s.tmac (cov*tl-au-print, ID, par@TL, par@AU, par@AI): Use value of `.R` register instead of "9999" to indicate an arbitrary large integer. 2024-10-04 G. Branden Robinson <g.branden.robin...@gmail.com> * src/preproc/tbl/table.cpp (table::init_output): Migrate to use of `.R` register for a huge value in generated groff language, instead of hard-coding (2^31)-1 as a numeric literal. contrib/mm/ChangeLog: 2024-09-15 G. Branden Robinson <g.branden.robin...@gmail.com> * m.tmac: Use value of GNU troff `.R` register instead of "99" {an "arbitrary large number"} as `Ls` register default. 2024-08-17 G. Branden Robinson <g.branden.robin...@gmail.com> * m.tmac (ds@output-float, ds@end): Use value of GNU troff `.R` register instead of "9999" to indicate an arbitrary large integer. [2] I credit OpenBSD with being perhaps the only BSD faction that seems to have articulated consistent principles about the copyright licenses they'll tolerate, and hewed to them in practice. They're as allergic to copyleft as anyone, but, as far as I've observed, their allergy is not selective with respect to license vendor or promulgator. They reject a copyleft from IBM (EPL), Sun (CDDL), or Marc Andreesen (MPL) with equal vigor. I respect that. But unprincipled exceptions might exist; I haven't done a license audit of their distro. I used to get paid for that sort of thing, but it's not something I'd undertake in my free time just for fun. [3] https://www.linuxfoundation.org/about/members Check out that L-curve. [4] Sun did _not_ make man(7) better. But I feel certain that's what the average BSD hacker in the street would have said in 1990.
signature.asc
Description: PGP signature