Hi Branden, G. Branden Robinson wrote on Wed, Jan 22, 2025 at 08:47:20PM -0600: > At 2025-01-22T07:10:54+0100, Ingo Schwarze wrote:
>> It does not require the user to provide any new configuration or >> options. It happens completely automatically and transparently >> whenever the user types "man pagename" or "apropos -a >> search_expression". > Right. I'll need to expose a mechanism via the man(7) interface (albeit > **NOT** that used for authoring documents) so that the man(1) program > can exercise it. (Again, see [1].) mandoc(1) controls both ends of > that pipeline; groff does not. Correct. That doesn't mean that mandoc is monolithic, though. In does have internal structure, albeit all inside the same single-threaded, completely linear process. 1. The steering program parses command line options and configuration files, consults zero or more mandoc.db(5) databases, produces a list of input files to process, and opens the required temporary files for writing, if any are needed. 2. For each top-level input file, the following steps happen in turn. First, the parsers generate the syntax tree. 3. After that, the appropriate validator for the input language used in the file validates and in some respects normalizes and transforms the syntax tree and collects tagging information. 4. After that, the appropriate formatter for the desired (input language, output format) pair adds the formatted output to one of the temporary files and the tags to the other temporary file. 5. Continue with step 2 until all input files have been processed. 6. The steering program frees all parser resources. 7. The steering program spawns the pager and waitpid(3)s for it. 8. The steering program unlink(2)s the temporary files. Items 1 and 5-8 (done by the mandoc steering program, main.c) would probably be the job for man-db. Items 2 and 3 clearly need to be done by troff(1) and the macro packages, maybe even including appending to the tags file. Item 4 is the job for grotty(1). So it appears man-db will likely have to generate the two filenames with mkstemp(3), passing them both to both troff and less and cleaning them up after the pager exits. [...] > Agreed; less(1) has no notion of addressing within a line (that I > know of), so being character-precise would be a waste of effort. Character-precise positioning would also be silly from a human perspective. Humans (at least when they are fluent readers, which can probably be assumed for most of the audience of manual pages) do not parse words character-by-character but use complex pattern recognition to instantly recognize whole words or even many words at once. So a line of text is hardly longer than what a human reader instantly recognizes anyway (except when using insane line lengths like 100-200 characters; i never understood how people who use such wide terminals manage to keep track of which line they are currently reading in the first place). [...] > But your mandoc(1) > has "man -T html"...how do you solve the problem of ambiguous tags > _there_? I'm going for a general solution; I'm trying to solve the > tagging/linking problem for HTML, PDF, and the terminal all at once. Trivially: if a tag occurs for the first time, just use it as-is. When a tag occurs the second time, append the suffix "~2". The third time, append "~3", and so on. [...] >>> I also don't happen to know how ctags(1) got extended to support >>> C++ name spaces and other means of qualifying colliding identifier >>> names. But if ctags (perhaps Exuberant ctags, given the original >>> ctags format's advanced age) got extended to cope with that problem, >>> presumably less(1) learned how to interpret the extension. >> Not that i'm aware of anything like that, no. Also, given that ctags(1) >> is mandated by POSIX and less(1) aims for portability (AFAIK), >> relying on extensions might be a bad idea. > Exuberant ctags's extensions are ubiquitous. Is that so? Personally, i have implemented a generator for ctags(1) files (in mandoc) and i'm using that frequently. But i don't recall ever having heard the word "Exuberant" as a name of a person or organization. Also, $ grep -RFi Exuberant /usr/src/usr.bin/ctags/ $ grep -RFi Exuberant /usr/src/usr.bin/less/ $ and the *.c files below /usr/src/usr.bin/ctags/ all have pure "Regents of the University of California" Copyright notices, no other contributors being named, and this is from the ctags(1) manual page: HISTORY The ctags command appeared in 2BSD. STANDARDS The ctags utility is compliant with the IEEE Std 1003.1-2008 (“POSIX.1”) specification, though its presence is optional. The flags [-BdFuvw] are extensions to that specification. Support for Pascal, YACC, lex, and Lisp source files is an IEEE Std 1003.1-2008 (“POSIX.1”) extension. The standard notes that ctags is "not required to accommodate these languages, although implementors are encouraged to do so". I don't see anything about non-BSD authors there. I even checked the complete commit log for the directory and see medium amounts of maintenance work being fone there, but no indication of non-BSD contribution oder feature additions for compatibility. [...] >>> Also, presumably mandoc(1) has solved the problem when it renders >>> multiple pages and more than one supports, say, an `-h` option. >> $ man -ak Fl=h > How man-db man implements support for tag-based searches is an open > question. That's Colin Watson's project, and while IIRC I have a commit > bit, my status in that project is as junior as one can get. :) > > My objective is to design the macro package side of the feature so that > Colin's job as designer (recognizing that he might foist implementation > onto me) is as easy as possible. > > man-db man(1) already sits on `-a` and `-k` so some other option letter > selection will have to be made. ??? I admit that mandoc supports several options that are POSIX extension, but -k is mandated by POSIX: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/man.html NAME man — display system documentation SYNOPSIS [UP] [Option Start] man [-k] name... [Option End] [...] -k Interpret name operands as keywords to be used in searching a utilities summary database [...] And then: $ cat /etc/debian_version 11.11 $ dpkg-query -l man [...] ii man-db 2.9.4-2 amd64 tools for reading manual pages With that man(1), "man -a crontab" works similar to what i would expect. It first displays crontab(1), then crontab(5), then exits. The string "-a" is hard to find in man(1) because there are so many false positives, but it is documented: -a, --all By default, man will exit after displaying the most suitable manual page it finds. Using this option forces man to display all the manual pages with names that match the search criteria. man-db is quite inconvenient in this respect because "man -a crontab" first opens only crontab(1), so you cannot compare to crontab(5) because that is not yet displayed. When you are done reading crontab(1), you have to press 'q' and then you need another (!!) keystroke (RETURN) to get to crontab(5). From there, you have no way at all to get back to crontab(1): when you press 'q' there, the whole program terminates and drops you back to the shell. With mandoc man(1), "man -a crontab" shows both manual pages at the same time, crontab(5) below crontab(1), such that you can scroll back and forth to your heart's content with *no* keystrokes wasted for switching from one to the other. Also very useful when viewing hundreds of manual pages at the same time: /^---<ENTER> n n n n n n n n n ... moves foward by pages, one page per keystroke, until you find one that interests you. The commands :t/t/T search for a tag, forwards and backwards, across all the displayed pages. Also, in man-db man(1), i just noticed that the normally very useful combination -ak is useless, even though both are supported and documented with the same meanings as in mandoc: $ man -ak cron cron (8) - daemon to execute scheduled commands (Vixie Cron) crontab (1) - maintain crontab files for individual users (Vixie Cron) crontab (5) - tables for driving cron That's what i would expect for just -k, not for -ak. For -ak, mandoc man(1) displays a concatenation of these pages: crontab(1), jmb(4), jme(4), jmphy(4), crontab(5), cron(8). (The jm* pages document JMicron hardware support, notice the "cron" in there.) So fortunately, man-db and mandoc agree about the basic meaning of both -a and -k. That's hardly a coincidence though: https://mandoc.bsd.lv/man/man.options.1.html -a display all matching manual pages man: 4.3BSD-Tahoe (June 1988), Eaton (before July 7, 1993; 1990/91?); OpenBSD, FreeBSD, NetBSD, man-db, man-1.6, illumos, Solaris 9-11 apropos, whatis, mandoc: OpenBSD 5.7 (August 27, 2014) only display items that match all keywords apropos: man-db (Aug 29, 2007) use all directories and files for mandoc.db(5) makewhatis: OpenBSD 5.6 (April 18, 2014) [superseded by -T ascii] ASCII output mode troff: Version 7 AT&T UNIX (January 1979) groff: probably before groff-0.4 (before July 14, 1990) So, while man-db clashes *with itself* in so far -a has two conflicting meanings in man(1) and apropos(1), it actually agrees with mandoc (as far as that is still possible despite its internal conflict). Please avoid adding options to man(1) if you can, and if you cannot avoid it, then *please* do consult https://mandoc.bsd.lv/man/man.options.1.html before doing so in order to minimize the risk of clashes. [...] > If one wants to select a subset of tags of interest, I guess that's > where the analogue of mandoc(1)'s "-ak" comes in. Hmmm, no. I'm not even sure what "subset of tags of interest" is supposed to mean. The meaning of -ak i simply "display all pages matching the following search expression". [...] >>> A recurring theme in my contributions to groff's man(7)/mdoc(7) >>> rendering has been to solve problems when rendering N pages at a >>> time, where N can be 1 but might be greater. >> Actually, the mandoc program demonstrates that from a pure user >> perspective, a very straightforward solution is feasible that >> will look totally trivial to the user. >> >> Then again, programs like makewhatis(8) in the mandoc package that run >> over lots of manual pages in sequence have been quite good at exposing >> memory leaks and bugs caused by leaking context data from one manual >> page to the next, and i rarely enjoyed the ensuing crashes and >> debugging efforts. > This sounds like the "fun" I enjoyed when getting "groff-man-pages.pdf" > (and its UTF-8 text counterpart) ship-shape. Many odd corner cases and > oversights. When formatting few pages at a time, say up to several hundred at the same time, i don't recall ever having seen such problems, it seems the likelihood is just too low at small scales. I *could* sometimes reproduce such problems by formatting just two or three pages in a specific order, but actually hitting such problems in practice seems to usually require thousands of pages. With thousands of pages, it has happened to me once every few years, which is annoying enough that i remember and fear it. =:c/ >> Still, running commands like >> $ man -ak ~. >> can be useful for finding bugs. By the way: >> $ man -ak ~. | col -b | grep -c ^NAME >> 8801 >> $ time man -ak ~. >> 0m17.16s real 0m15.07s user 0m01.18s system >> $ time doas makewhatis >> 0m10.58s real 0m08.96s user 0m01.35s system > How many separate Unix processes are involved in doing that, though? ;-) Exactly one, no parallelization or multithreading whatsoever. > nroff(1) and man(1), going back to early editions of CSRC Unix, are > built around the filter model and pipe(2). It's long been known that > context switches are expensive, mode switches more expensive still Admitted. $ time for in in $(jot 100); do man -c true; done 0m00.83s real 0m00.01s user 0m00.02s system Since the size of the true(1) manual is almost zero, that - eight milliseconds for one page - is almost entirely process setup overhead, and that's about four times the CPU time required to format an average-sized (i.e. much larger) manual page. [...] > I monitor mandoc(1)'s cvsweb closely and can't remember when > I've seen a change that caused me concern. If I ever do, I'll > let you know. Appreciated! :) If you want to receive the mandoc commits automatically via mail, i can subscribe you to the source@ mailing list, just let me know. Some information about the lists is at https://mandoc.bsd.lv/contact.html They are much less active than the groff@ list. >> Updating the groff port in OpenBSD causes a horrendous amount of work >> - so much so that i still didn't manage to update to groff-1.23 in the >> OpenBSD ports tree, mostly due to large amounts of subtle changes in >> behaviour between 1.22.4 and 1.23. Before being able to update, i >> have to disentangle and classify all those changes as follows: >> >> 1. Desirable effects of bugfixes and of changes making groff >> behaviour saner or more consistent; some of these require >> adjustments in mandoc, too, and/or maybe in the test suite. > I indeed hope that the great preponderance of problems are of this kind. Not really. By far the largest class is class 3 (build system issues). Class 1 and 2 (good vs. harmless gratuitious changes) might be about on par - though i'm not completely sure yet, as i'm still struggling to classify the large amount of issues. >> 2. Trivial changes that aren't necessarily intentional, but >> only affect unimportant edge cases such that they can stay. >> A few of these may requires adjusting mandoc, too, and >> several require asjusting the test suite. > I'm curious to know about these if you'd care to record them. They > might not be things I want to change or revert, but they might bring my > attention to consequences I hadn't considered, and maybe I can > anticipate similar impacts in the future. Some of these are already fixed in mandoc or its test suite, and i did not keep a list about those. But i think i can produce such a list after committing the groff port, because i certainly note in the commit message when a tweak is motivated by a change in groff. >> 3. Build system hiccups. These typically require setting >> configure or make(1) variables in the ports Makefile, but >> diagnosing what exactly is needed is typically hard. >> A few extreme cases might require patches. >> Due to extensive use of GNU autotools and gnulib, this >> class of problems is particularly large and annoying. > I can't claim credit/blame for restructuring groff around Autotools and > Gnulib, but I'll accept responsibility for not changing it. For non-BSD > systems they deliver remarkably little hassle. > > I should bring your attention to: > > https://savannah.gnu.org/bugs/index.php?66518 Yes, i'm aware of that, and that will likely result in further degradation of portability. There are three reasons why i didn't mention my doubts about that earlier: (1) In theory, the idea of a standardized approach to portability sounds good, so it's hard to argue with that. The problem is not the basic idea, but that gnulib is overzealous *and* overengineered to such an extent that the end result is nothing short of catastrophic. (2) libgroff isn't exactly code of stellar quality either, so it's equally hard to argue that getting rid of it would be a bad idea. I think the low code quality in libgroff (IIRC) also caused a very small number of build failures in the (remote) past, but that was mostly before your time, and both much rarer and much less severe and much easier to fix than gnulib issues. IIRC there may have on the order of one to three issues with libgroff, grand total, ever - in groff-1.23.0, there are at least a dozen issues right now, maybe more. (3) I spent less time on mandoc and groff lately than in some previous years and hence missed some stuff - and was leass eager to make a fuss. [...] > I see commits from Bruno with respect to FreeBSD 11, NetBSD 7.1, > and OpenBSD 6.0 within the past month (in time to make gnulib's 2025-01 > stable tag). OpenBSD 6.0? That's... hilarious. $ uname -a OpenBSD isnote.usta.de 7.6 GENERIC.MP#496 amd64 OpenBSD 6.0 is more than 8 years old and has been EOL and unsupported for more than seven years now. Supported FreeBSD releases currently are 13.4, 14.1, and 14.2. FreeBSD 11 was released in 2016 and EOL in 2021. Supported NetBSD releases currently are 9.4 and 10.1. NetBSD 7.x was released in 2017 and EOL 2020. So if your observation summarizes the situation adequately and if i'm not missing something, that would mean that only bugs are getting fixed that were reported at least seven years ago and affect operating system versions that are no longer supported for at least seven years. Not sure what to make of that, i didn't try to work with the gnulib folks and i'm not particularly eager to try, either, given how their code looks. In any case, getting groff to build at all is clearly more important than trying to help fix gnulib - which might well be a lost cause anyway, even if the maintainers are freindly and well-intentioned and work hard. I suspect the trouble stems more from basic design principles and development goals than from individual bugs or oversights. > I'd happily try {Free,Net,Open}BSD builds myself, but the FSF France's > compiler farm hosts for these OSes are down/unresponsive every time I > try them. I have no access to any FreeBSD or NetBSD machines. I do have some OpenBSD machines, but none of those can be used for testing purposes. I have sent a crashing build log in private mail to you, though. >> 4. Regressions. These require pushing bugfixes to groff, >> plus patches in the ports tree until they get released. > I definitely want to know about these ASAP. Or do you mean the ones > I've already noted and committed, kicking them up to "Important" > severity in Savannah? Some may already be fixed, not sure i have reported all yet. I will definitely show you a complete list of patches we use once the port is ready, and that includes patches for all bugs fixed in the port. I'm also working on reducing patches and instead adding lines to groff site configuration files, as intended by groff developers, where possible. >> 5. Intentional changes that we do not want to have. >> These require patches in the ports tree that may have >> to stay even in the longer term. > I think I know one or two of the ones you have in mind, but I'd like to > know where I can look to stay apprised of these. Even if I don't think > a distributor patch belongs upstream, it tells me important information > about the needs of the user community. (And, regrettably, sometimes its > cluelessness.[5] Yes, those patches will be included in the final list. [...] > I'm not happy to hear that. Apart from not using Gnulib or the GNU > build system, both of which I prefer to retain because I perceive them > of _relieving_ me of maintenance overheads (which is their reason for > existing in the first place), I'm keen to hear your suggestions for > things I can do that will reduce the pain. Yes, that's definitely the plan to aim for providing feedback in a more specific and more constructive manner. I'm wasn't really expecting that suggesting to ditch gnulib would be met with enthusiasm. Even though i suspect that the *real* portability issues groff needs to deal with require less than 1% of the behemoth of code included from gnulib, simply because i suspect that groff portability needs are very modest, given that groff doesn't exactly use all the most modern features in its code base, even though i'm quite sure that going for a simple, straightforward scheme like the one used by mandoc, with one static configure script that is written fully by hand and never regenerated, which produces a grand total of 52 lines of output, and which take the replacement implementations from OpenBSD instead of from glibc, such that they are typically a fraction of the size and don't typically contain any preprocessor directives, would significantly improve groff portability *and* significantly reduce your workload - i still fear that might not very well fit into GNU philosophy and hence not be all that welcome, even if it would be undeniably efficient. I mean, just look at the list of systems on which mandoc runs: https://mandoc.bsd.lv/ports.html >> Don't take that as a complaint, though; i still appreciate your work. > Likewise! Because mandoc(1) is conscientiously maintained, I make > frequent reference to it when advising man page authors/maintainers > regarding portability. Thanks. > By contrast, specimens like Illumos troff are > seldom worth mentioning. It's just Solaris 10 troff under a different > name, and its development community seems to have its focus utterly on > other aspects of the system. Is troff in Illumos still relevant for what we are discussing here? Illumos folks told me in 2014 that they started using mandoc(1) for manual page formating at that point instead of *roff, even though, last time i checked, they still used Solaris man(1) as the viewer. So like many Linux systems use groff+man-db, for all i know, Illumos uses mandoc+Solaris-man, and by the way, MacOS now uses mandoc+FreeBSD-man. [...] > 1. GNU and Linux programs have a tendency to efflorescence You have a point, trying to make GNU and Linux programs lean such that their manuals become more manageable probably isn't the easiest or most rewarding project anyone could pick. Also, it doesn't hurt much when we agree to disagree on OPTIONS. > Heh, as you're probably aware, I use a "Notes" section myself, in 3 of > groff's pages. > > 1. In nroff(1), because of the SGR problem. A lot of the same people > who insist that groff not produce SGR escape sequences will not ever > think to look in grotty(1). (Incidentally, this same page uniquely > _lacks_ an "Options" section; a piloted your approach to see what I > thought of it.) Hmmm, that's indeed a tough case because that information hardly belongs into the nroff(1) page, and consequently there probably isn't any good place at all - but i still see why you want it there. Shallow wrapper programs (like nroff) are often hard to document properly because most of the information users need for using them does not belong in the manual page of the wrapper but into the various other programs that the wrapper wraps. It's a dilemma resulting from dubious API design involving too many abstractions and layers. > 2. In groff_me(1), to explain the origin of the package's name. The > page is otherwise pretty terse and businesslike, and more an > aide-mémoire than a true reference, so I saw no other good place to > put it. I tend to think the first sentence of those NOTES (the one about single-letter names) belongs into the HISTORY section of groff_tmac(5). Most of the second sentence belongs in the AUTHORS section, which should probably also say something like "reimplemented for groff by James Clark in 198x". A few bits near the end of the second sentence belongs in the HISTORY section, which should probably say something like The me macros first appeared in 2BSD. Since groff is not primarily used in BSD communities, many groff users might not understand what the somewhat cryptic "2BSD" means, so maybe something like ... in the second Berkeley Software Distribution (2BSD), released in May 1979. So here, i really see no need for NOTES. > 3. In groff_man_style(1), as a kind of FAQ. I won't apologize for > this; I need it to dispel myths and confusion. I _could_ call it > something else; its present name seems good enough. You got me there. A style guide in manual page form is so unusual that some unusual decisions might be called for. Also, a style guide, in particular when it is long, might prioritize pedagogy over conciseness and rigour, in which case my argument that every topic needs to be discussed in exactly one place breaks down and afterthoughts may become legitimate. Looks like you may have found an example unusual enought that even a NOTES section can be defended. [...] > At 2025-01-22T22:04:59+0100, Ingo Schwarze wrote: >> G. Branden Robinson wrote on Mon, Jan 20, 2025 at 11:59:41PM -0600: >>> We need a design for automatic construction of >>> tag/anchor names from the user-specified names of the items to be >>> tagged. In man(7) documents, those taggable items are probably going to >>> be: >>> >>> 1. the identifier of the page itself, with "section" number; >>> 2. section heading text; >>> 3. subsection heading text; and >>> 4. the tag text of tagged paragraphs (`TP`). >> In addition to those, mandoc(1) also tags the tag text of .IP and .TQ. > Ah! I forgot to mention `TQ`. Yes, it's completely my intention to > treat `TQ` the same as `TP` in this respect. > > But not for `IP`. That macro seems, somewhat preponderantly, to already > be getting used as a non-semantic, or differently semantic, device, to > mark lists with symbols or enumerators. (It's more structural than > semantic, we might say.) That's good. I want to encourage and > reinforce that practice. Maybe. Given that both serve almost the same purpose - the main difference only being that .TP supports macros in the tag and .IP does not - some style guidance regarding when to use which one might make sense. Deprecating .IP outright doesn't seem like a good idea because .TP \(bu text body is very ugly, and bullet+numbered lists is a reasonable scope that works well with the .IP syntax. However, even in the OpenBSD tree, which does not contain particularly many man(7) manuals, significant numbers of manual pages contain long tag arguments after .IP macros. Most of these are GNU manuals. Even worse, pod2man(1) emits .IP, not .TP, for tagged lists. So you would be punishing end users for something that was considered OK in the past and that documentation maintainers and code generator maintainers need to fix - possibly before knowing whether documentation maintainers will even agree with your position. Here is a list of some manual pages affected, where users won't have tags (or at least less tags) because of your policy on .IP: addr2line(1) ar(1) as(1) c++filt(1) ld.bfd(1) objdump(1) readelf(1) objcopy(1) strings(1) readline(3) mkhybrid(8) and almost all Perl manual pages. On top of that, all FVWM manual pages, editres(1), sessreg(1), twm(1), xbacklight(1), xedit(1), xpr(1), xrandr(1), xsetroot(1), XF86VM(3), Xsecurity(7), and almost all X11 section 3 library manuals. >From a random collection of a few ports i have currently installed: arara(1), bib2gls(1), bzless(1), python3.12.1(1), practically all FFmpeg manuals, afm2afm(1), albatross(1), autoinst(1), curl(1), cvs2cl(1), dvipng(1), epstopdf(1), gslp(1), install-tl(1), luafindfont(1), mk-ca-bundle(1), ofm2opl(1), ovf2ovp(1), pedigree(1), ps2pk(1), repstopdf(1), thumbpdf(1), ttf2afm(1), unzip(1), updmap(1), and all the GnuTLS manuals ... So not providing support before deprecation takes effect may not be the friendliest move. [...] > NEWS: > * The an (man) macro package now supports a `TS` register to configure > the minimum space required between the tag of a `TP` paragraph and > its body. (If the width of the tag's formatted text plus this space > exceeds the paragraph indentation, the line is broken after the tag.) > This parameter, formerly hard-coded as `1n`, now defaults to `2n`. > > * The an (man) macro package's `IP` macro no longer honors the formerly > hard-coded 1n tag separation noted in the previous item. This means > that the first argument to the `IP` macro can abut the text of the > paragraph with no intervening space. If you use a word instead of > punctuation or a list enumerator for `IP`'s first argument, consider > migrating to `TP`. Regression suite fun at the horizon for the 1.24.0 ports update. > I haven't brought this to your attention before now because I expected > you to: > > A. Ignore the `TS` register entirely; and either, Certainly, but the change of the default is likely to cause a few hours of work on the mandoc and test suite sides. > B1. Go along with my tweak to indentation, or Probably (because off the top of my head, i suspect it makes .TP more similar in style to .Bl -tag), but i'm not sure yet. > B2. Not, overriding it with a patch. Unlikely, i somewhat dislike patches for trivial tweaks like that. If it turns out it causes more serious trouble than just churn, there is a third possibility: B3. Harass you for a revert, even if i'm late to the party. [...] >>> For man(7) the `MR` macro new to groff 1.23 was an obvious site >>> to add the appropriate machinery for document-level links. >>> mdoc(7)'s `Xr` is closely analogous and has existed for many >>> years. >> Yes, both have almost identical semantics and are a likely candidate >> for extension, if we come to the conclusion an extension is needed. >> I didn't consider the details yet, though. > I think you were discussing `SX`/`Sx` here? As I understand it, `MR` > support is already in mandoc(1) CVS (but not released), as of course is > `Xr`, for some years. I guess you misunderstand. I did not mean "add .MR as an extension" - that has indeed been agreed and implemented already. I meant "extend the existing .MR and .Xr macros with another argument or something like that". > Also, you can't put a price on the pleasure of introducing a macro > named `SX`. It pleases the 13-year-old in everyone. I scratched my head for some time what you even meant here before finally getting it - even though it works in German exactly like in English. I think i wouldn't have noticed that even as a teenager, neither as 13-year old nor as a 19-year old. And no, i wasn't talking about .Sx. The .Sx macro is for local links inside one page. But here you want to design a macro to link from one page to another. That's what .MR/.Xr do, so those are the first candidates that come to mind for extension. Also, extending .Sx for that purpose doesn't seem easy, since it already accepts an arbitrary number of arguments (e.g. .Sx SEE ALSO). Number theorists may feel comfortable reasoning about the number omega+2 - computers get very nervous when you ask them to handle it, they don't even like omega itself all that much. [...] > While deep structural links risk encouraging stagnant man page > structures, deep unstructured links will promote retention of features. > And there are enough forces favoring the latter. Our industry has a bad > problem with not throwing old cruft away. OpenBSD suffers from that problem less than other projects. Ted Unangst (tedu@) has earned such a high reputation and so much respect for excising large amounts of code from all over the tree that "teduing" has become a neologistic synonym for "cleaning up". LibreSSL is among the most active parts of the OpenBSD tree, and currently, at least 80% of the work done there is teduing (during the first two year of development, it felt more like 98%). That still holds even though Ted U. is currently no longer involved in LibreSSL (he was years ago and is among the founding members of LibreSSL precisely because he wanted to tedu there, and indeed he did so. I'm not convinced we should let speculations about possible effects on people's bad habits influence the deep linking design - that design task is difficult enough without additional non-technical constraints. Besides, i doubt that the design of deep linking can really improve people's willingness to tedu. > Using English instead of machinery might remain our most robust > technology. "See the “-h” option in ksh(1)." That's exactly what Jason McIntyre (jmc@) and myself have been doing in OpenBSD for decades, and i'm not aware of any problems it may have caused. > For URLs (and email addresses), it's already in place. We use OSC 8. Yikes. ANSI X3.64. The nightmare. Reminds me the we ought to disable ANSI X3.64 support in OpenBSD xterm(1) by default. It's just too dangerous due to the very lange number of escape codes that make it hard to secure and the fact that many of them can wreak havoc. A manual page viewer is a typical example of a program that must be able to run securely, even as root, and that must not, under any circumstances, make the terminal window unusable. It might contain the last available shell on a remote machine that is in trouble, and reading the manual page may be needed to implement a fix for whatever problem there is. But manual pages are essentially untrusted data, so allowing a manual page viewer processing manual pages to send dangerous escape codes to a terminal is not acceptable. Maybe you argue "well, the manual page viewer must not *pass through* escape codes, but there is no risk in *creating* certain escape codes from scratch" - but that violates the principle of multi-layer security. Sanitize input and reject ANSI X3.64 contained in the manual page. AND sanitize output, making sure that you never create ANSI X3.64. AND make sure that the pager always ignores ANSI X3.64, i.e. never run it with -r or -R. AND make sure all potentially dangerous ANSI X3.64 codes are disabled in the terminal emulator - and that certainly includes OSC 8. I would go so far that not only should the "Operating Sytem Command" ANSI code (as a whole, not just OSC 8) be disabled by default, but the possibility to enable it should be patched out of xterm(1) lest users do that by accident when editing xterm(1) config files or blissfully ignorant of the risks - as in "oh, colour sounds nice, let's do that". Multi-layered security actually provides security. Artful arrangement of slices of swiss cheese such that (hopefully?) the holes never align is a recipe for disaster. > If less(1) strips that (it doesn't) or the terminal doesn't support it, > the user doesn't get hyperlinks. Okay, so they're no worse off than > before. In short, we have no solution for the task we set out to solve, right? [...] > I have no intention or desire to make man-db obsolete. I think the > separation of its concerns from the formatter is sound, and I don't look > to disrupt it. OK, i get it, that part makes sense now. [...] > • What’s the difference between a man page topic and identifier? > > A single man page may document several related but distinct > topics. For example, printf(3) and fprintf(3) are often > presented together. What you call "topic" here is called "name" in the mandoc(1) documentation. Maybe not a huge problem because the mandoc documentation does not define a term "topic" at all, so there is no clash. > Moreover, multiple programming languages > have functions named “printf”, and may document these in a man > page. The identifier is intended to (with the section) uniquely > identify a page on the system; it may furthermore correspond > closely to the file name of the document. What you call "identifier" here is called "title" in the mandoc documentation. Mandoc treats the title a an additional name; for that reason, the man(7) page in the mandoc package takes the shortcut of presenting this synopsis: .TH name section date [source [volume]] Of course, the title usually matches one of the other names, and often the first one, though that is not necessary. The stipulation that the "identifier is unique within the section" is completely unrealistic in practice, on Linux even more than on *BSD. The package manager *can* make sure that two packages do not in install a file into the same file system path, clobbering each other. And indeed, package managers often do check that - sometimes already by flagging such clashes in a centralized package database in the build system infrastructure of the operating system developer, requiring packages to be fixed or marked as conflicting when they clash, such that both cannot be installed at the some time. Some other package managers also or only do such checks of not overwriting existing files owned by other packages at install time, and refuse installing when encountering conflicts. But i have not heard of package managers attempting to parse manual pages and try to detect *logical* conflicts in the *content* of manual pages. That would seem very hard to implement and in addition rather fragile and inefficient. Even Marc Espie's pkg_add(1) packet manager in OpenBSD, which spends a lot of effort on handling documentation with special care, does not do that - and it would also be wrong to do that, because .TH/.Dt clashes cause no problem and are simply legitimate. > The man(1) librarian makes access to man pages convenient by > resolving topics to man page identifiers. That is not true and misleading. In your terminology, the correct statement would be: The man(1) librarian makes access to man pages convenient by resolving each topic to one or more fully qualified file system paths to manual page files. That is not only true for mandoc, but for all man(1) implementations i'm aware of, in particular including man-db. > Thus, you can type > “man fprintf”, and other pages can refer to it, without knowing > whether the installed document uses “printf”, “fprintf”, or even > “c_printf” as an identifier. The term "identifier" is badly misleading because multiple pages with the same "identifier" (or "title" in mandoc terminology) can exist in the same section of the same manual page tree, and all those files can even be in the same directory. This does not prevent man(1) from finding all these files anyway. What really "identifies" a manual page is the fully qualified files system path - because you cannot have two files with the same filename in the same directory. [...] > While we don't want to put every tag into the "Name" section so that it > will be indexed by the librarian-- Complete agreement, that would make the NAME section totally unreadable. > indexing all `-h` command line options would be worse than useless-- Not true. The makewhatis(8) program in the mandoc package does exactly that, index all -h command line options in all manual pages, and you can search for them with man(1) -k, as i demonstrated in an earlier mail. That's quite useful, too. > I think every symbol in a C API should be. > Yes, even macros and non-function objects. Here, we can happily agree to disagree. Treating macros that take arguments exactly like functions is fine because the distinction is a technicality. Treating constant macros like functions, however, is over the top. In some pages that only mention one or two constants, in might not cause much grief, but pages exist that document large numbers of constants. Here is an extreme example that would massively bloat the NAME section: https://man.openbsd.org/errno.2 Constant macros are like wolves, they often come in packs. Often, one constant requires much less documentation than one function because it has far fewer moving parts: no arguments, no return value, no semantics, not even any syntax apart from the literal name itself. (Exceptions exist where constants do require massive amounts of documentation because they are essentially abused like functions. Yes, i'm looking at you, EVP_PKEY_CTX_ctrl(3).) And finally, significant numbers of constants are used by more than one function, sometimes even by functions in several manual pages, which makes the question which manual should have the constant in the name section rather arbitrary. In mandoc, the whole question is moot anyway because you can say $ man -k Er=EINVAL and get all 221 manual pages refering to it listed, without needing it in any NAME section. Simlarly for type names: each type is typically used in many manual pages, even though some exceptions of specialized types that are only relevant for a single page do exist. > These are "first-class" concerns just like function calls. > Part of your interface? Document it. Absolutely! But not every public symbol needs to be a "topic", in your parlance (or a manual page name according to mandoc). [...] > At 2025-01-22T23:04:33+0100, Ingo Schwarze wrote: >>> Possibly I'll formally propose an `SX` macro for man(7) at some point. >> You mean that like mdoc(7) .Tp, not like mdoc(7) .Sx, right? > Uh, groff mdoc(7) doesn't have `Tp`. > > Uh. I don't see it in mdoc(7) from mandoc 1.14.6-1 on my Debian system > either. > > What is it? Sorry for the typo, i meant .Tg (manual tagging), not .Tp, which indeed does not exist and is not planned. [...] > there may be a subtle difference between the ways we're using the > word "node" here. Not subtle at all, it's a drastic difference. Here are examples of nodes in mandoc: * The root node, which contains the syntax tree of the whole document. * A section node, for example one that contains the whole DESCRIPTION section including its title. * A full explicit node, for example a complete list or display * A partial explicit node, for example all that results from a quotation macro like .Do that has a matching end macro, .Dc, including the content. * A partial implicit node, for example all that results from a quotation macro like .Dq that extends to the end of the input line, including the content. * An in-line node, for example an emphasis macro including its content * An input text line, possibly including escape sequences * A text string that is a single argument of a macro, possibly including escape sequences * Certain low-level roff requests that directly produce output or change formatting state in a way similat to what macros do, in particular: .br .ce .fi .ft .in .ll .mc .nf .po .rj .sp .ta .ti Most roff requests are *not* nodes but get fully resolved at the pre-parser stage. * Also, so far, no escape sequence can ever be a node - even though making escape sequences nodes would be beneficial for some purposes, so i might do that at some point What roff calls a "node" would be called "escape sequence or character" in mandoc. There are some artifacts that roff calls "node" that mandoc does not represent at all, neither as a node nor as an escape sequence, and that it never needs for anything, for example "line_start_node". Furthermore, mandoc does not distinguish between the ordinary space character (U+0020) and word_space_node. Instead, mandoc internally represents blanks that are not word spaces and hence do not allow a line break by a special ASCII code point, and similar for some other whitespace-related cases, see mandoc.h: #define ASCII_NBRSP 31 /* non-breaking space */ #define ASCII_NBRZW 30 /* non-breaking zero-width space */ #define ASCII_BREAK 29 /* breakable zero-width space */ #define ASCII_HYPH 28 /* breakable hyphen */ #define ASCII_TABREF 26 /* reset tab reference position */ > The long version? [about diversions] Thanks, that was instructive. > 5.29 Diversions > =============== > In 'roff' systems it is possible to format text as if for output, but > instead of writing it immediately, one can "divert" the formatted text > into a named storage area. The reason why mandoc is quite unlikely to ever implement diversions is that the most central design principle of mandoc is that the parse tree is guaranteed to be independent of the output device, and completely finanalized before the program even looks at the question which output device the user selected. No part of the formatters can ever be called before the parse tree is fully complete and immutable. All substitutions (in particular, of user defined macros, strings and number registers) must be completed before the parsers can even be started. Whatever is stored in any user defined string or number register after parsing has started is guaranteed to have no effect on the output. Hence, storing anything produced by the formatters (which can only by invoked after parsing is complete) into any string or register (which no longer have any effect if parsing was even started) is totally out of the questions. I would have to throw away the most fundamental parts of the software architecture and start over completely from scratch. > That's why I want a string iterator and well-defined operations to > identify nodes so that they can be stripped from strings, without > bespoke formatter features and without hacks. Yes, mandoc also contains string iterators in a number of places - the iteration itslef is so short and straightforward that there is no abstraction for it even though it's needed at more than one place. The tricky part is the handling of escape sequences (what groff would call nodes), implemented in the recursive functions in the file roff_escape.c and called whenever needed for iteration or processing. > Will we ever need or want type-aware node operations in the groff > language? Good grief, I hope not. But the mere existence of nodes in > bona fide language objects has already created pain--esoteric pain that > produced diagnostic messages that no one on this mailing list > understood. (Or if they did, they chose not to share knowledge.) Oh the joys of in-band meta messaging: escape sequences embedded in plain text strings. Didn't i already get riled up about that very subject earlier in this message? Yours, Ingo