Hello Alejandro, Alejandro Colomar wrote on Tue, Sep 23, 2025 at 07:19:15PM +0200: > On Tue, Sep 23, 2025 at 06:48:51PM +0200, Ingo Schwarze wrote:
>> For example, with the mandoc implementation of man(1): >> >> $ man true false >> >> shows the true(1) manual page, followed by a separating line: >> ----------------------------------------------------------------------- >> followed by the false(1) manual page. The entire output is shown in >> a single less(1) instance as if it were a single output file, with no >> need to type ":n" to get to the next manual page. Consequently, it >> is trivial to search for strings across the whole output, across >> all pages, with just the less "/" command, or to perform semantic >> searches across all pages with just the less ":t" command. > I prefer your approach over that of man-db, at least per how you > describe it. I've never used mman(1) before with more than one page, > and it seems to be broken at the moment in Debian Sid: > > alx@debian:~$ mman false true | cat > mman: outdated mandoc.db lacks false(1) entry, run makewhatis /usr/share/man > mman: outdated mandoc.db lacks true(1) entry, run makewhatis /usr/share/man > FALSE(1) User Commands FALSE(1) > NAME > false - do nothing, unsuccessfully [...] > GNU coreutils 9.7 June 2025 FALSE(1) > > -------------------------------------------------------------------------- > > () () > > ?�????????�TÑnÓ0?}ÏW [...] Ouch. I'm able to reproduce that bug on OpenBSD-current. This must be the umpteenth time that something is broken with compressed manual pages - i keep saying that compressing manual pages is pointless in the 21st century, not only because the space savings are negligible compared to the size of modern function libraries and programs, but also because it adds complexity and hence fragility. I freely admit this bug was my fault, but all the same, triggering it was a consequence of compressing manual pages. I have committed the bugfix here (rev. 1.364): https://cvsweb.bsd.lv/mandoc/main.c Thanks for the report! That said, i really need to roll the next mandoc release, to get all the bug fixes out to users. Around November 2025 would probably be an ideal time. [...] > Another thing is when indenting stuff, there's so many levels of > intendation (think of the old proc(5)), and each level might be hundreds > of lines, that I have a really hard time tracking down where things > start and end. I agree. > In general, catenating stuff is trivial, but undoing that operation > is not. Indeed, that is one of the many problems with catenating manual page sources before formatting them. Many manual pages, in particular autogenerated man(7) pages, have a header of low-level roff(7) instructions preceding the .TH macro, so finding the beginning of the next manual page is not quite as easy as finding the next .TH macro. In particular, it would be an extremely bad idea to let the .TH macro reset *any* parser state because that would break many autogenerated man(7) pages - you could argue that putting low-level roff(7) into a manual page is evil in the first place, but just wiping it out an one particular, essentially random place in the middle of the manual page, i.e. at the .TH macro, is still quite harsh a punishment. For mdoc(7), matters are not quite as bad in so far as mdoc(7) autogenerators are virtually unheard of (well, pod2mdoc(1) exists, but so far, that is only used for semi-automatic semi-manual format conversions of old perlpod(1) manuals to new mdoc(7) manuals. Nobody uses it (yet) for routinely rendering perlpod(1) manuals with less(1) :t tagging support, though the idea certainly exists). Also, low-level roff(7) preambles are far less common in mdoc(7) than in man(7) pages. On the other hand, for mdoc(7), the situation is much worse than for man(7) in so far as the macro order .Dd .Dt .Os used to be mere convention, and any other order of these three macros used to be equally valid. Groff-1.23 utterly broke that and now always starts a new manual page at .Dd, so every manual page with a different macro order is now totally broken with groff. [...] >> You could simply add FD_CLOEXEC as a name to the NAME section that you >> consider canonical for defining FD_CLOEXEC, such that users can simply >> type "man FD_CLOEXEC". We don't to that in OpenBSD because when >> semantic search is available, "man FD_CLOEXEC" provides little benefit >> over "man -ak Dv=FD_CLOEXEC" or "man -ak any=FD_CLOEXEC", so just >> as you consider additional links in the file system excessively noisy, >> we consider even (less noisy) additional name section entries too noisy. >> Don't forget that defined constants are significantly more numerous >> in some APIs than function names, so there is a real danger to cause >> readers to miss the forest among all the additional trees. > Yup; that's what has stopped me from doing that in the past, and I still > don't think I'll do that. I prefer leaving it up to a trivial Unix > pipe searching within /usr/share/man (for non-trivial needs), or man -K > for trivial needs. > > This is quite easy: > > alx@debian:~$ man -awK FD_CLOEXEC > /usr/local/man/man3/popen.3 > /usr/local/man/man3/posix_spawn.3 [...] > /usr/share/man/man7/systemd.directives.7.gz > /usr/share/man/man7/fcntl.h.7posix.gz > > And when I need more complex stuff, I can do just anything with pipes. > It requires knowing where the source code is located, but people with > those needs will most likely know where the manual pages are installed, > and that they might be compressed, so I'm not too worried. Glad to hear that. I use grep(1) -R as a last resort, too, but even though just like you, i'm probably a manual page power user to a very unsusual degree, using man(1) dozens of times every day, sometimes possibly hundreds of times, i need grep(1) -R over manuals very rarely, probably about once every few weeks or months. >>> My idea is having a proc(7) page that would essentially be built as: >>> $ find man5/ | grep proc | sort | sed 's/^/.so /' > man7/proc.7; >> I'd very strongly advise against that, for more than one reason. >> Neither of the two manual page formats is well-suited for >> concatenating input files and formatting them in a single run >> of the formatter. Doing that tends to cause lots of unexpected >> and hard to diagnose issues. Instead, such a job should be done >> by man(1): let the formatter format each page individually, then >> concatenate the results, *never* concatenate the source code. > I find recent groff(1) being quite able to handle multi-.TH pages Branden has invested massive effort into making it kind-of work, in fact so massive that i have totally lost track of what is going on. If i remember correctly, he has invented lots of new registers along with lots of novel rules how to use them to make it work, wrapping himself into elaborate nets of overengineering and resulting in long discussions in various bug tracker tickets about how it is all supposed to work. I refrained from reading most of that - too hard to understand and not really relevant for any practical purpose that i care about. > I am going to agree to not do this for users, but I do this often for > myself. I often want to see all the SYNOPSYS or STANDARDS (or whatever) > sections of *all* manual pages under man2 and man3, Actually, for SYNOPSYS, there is a dedicated option -h: $ man -h -s 3 -k . | less For STANDARDS, i typically run $ man -s 3 -ak . and then type /^STANDARDS and repeatedly press n and N as needed, one advantage being that when needed, i can look at the surrounding text with no hassle. > and what I do is > cat(1) them together, extract the right sections (plus the TH lines) > with sed(1) (actually, I first do this, then catenate), and then pipe to > 'man /dev/stdin'. It works quite nicely (with recent groff(1)). Sure, that would likely work with mandoc, too, but seems to imply more work than is really needed unless i'm missing the point. >> Also, this would result in massive multiplication of installed >> text (wasting space) > .so pages don't duplicate text, do they? Or you mean in indices? Uuh, sorry, i was too inattentive and misread your line essentially as $ find man5/ | grep proc | sort | xargs cat > man7/proc.7 Using .so feels even worse due to the notorious fragility of .so. Then again, since you are doing this within a single manpath and only after chdir(2)ing to the best directory available for the purpose, maybe the worst of the fragility won't bite here, but who knows. While .so can be useful for general typesetting needs, it is best avoided when doing anything with manual pages. No, i wouldn't be too worried about indexes. Even a full semantic search index is quite small compared to the pages themselves, and that's by design because we want searches to be fast and we don't want the mandoc.db to block too much of the buffer cache: $ du -sh /usr/share/man /usr/share/man/mandoc.db 44.9M /usr/share/man 2.4M /usr/share/man/mandoc.db A non-semantic seach index is even smaller, though surprisingly enough, not by all that much: $ lsb_release -d No LSB modules are available. Description: Debian GNU/Linux 12 (bookworm) $ dpkg-query -s man-db | grep -e Status: -e Version: Status: install ok installed Version: 2.11.2-2 $ du -sh /usr/share/man /var/cache/man/index.* 41M /usr/share/man 1.1M /var/cache/man/index.db It appear Oracle Solaris has switched its apropos(1) to support indexed full text search: schwarze@unstable11s [unstable11s]:~ > uname -a SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise > du -sh /usr/share/man /usr/share/man/man-index 90M /usr/share/man 30M /usr/share/man/man-index At first, i didn't feel sure whether that's a particularly wise choice considering the massive database size... But it turns out search times aren't all that bad: schwarze@unstable11s [unstable11s]:~ > time man -K editor | wc 850 5377 41859 real 0m0.128s user 0m0.081s sys 0m0.048s # that's with a 2008-era SPARC64 VII quad-core processor, # each of the 4 cores capable of running two threads in parallel And the output really includes stuff like this, among other things: 36. bashbug(1) DESCRIPTION /usr/man/man1/bashbug.1 attempts to locate a number of alternative editors, including 37. libgconf-2(3) SEE ALSO /usr/man/man3/libgconf-2.3 gconf-editor(1), 180. git-pull(1) OPTIONS /usr/man/man1/git-pull.1 Invoke an editor before committing successful mechanical merge to further edit the auto-generated merge message, so that the user can explain and justify the merge\&. The 191. c++(1) OPTIONS /usr/man/man1/c++.1 about any unresolved references (unless overridden by the link editor [...] >>> What do you think? >> I expect you will be very surpised in how many different ways such >> a scheme will bite you if, God forbid, you ever try it for real. > Nah, I've been convinced of not trying it. Thanks! :) That's a relief. I was already becoming afraid of fallout down the road. I mean, if you do something in the Linux Manual Pages Project, due to the considerable importance of the project, it often has wider effects far beyond the project itself. Yours, Ingo
