Re: mini-book manual pages through multi-.so pages (i.e., the old proc(5) page)

Alejandro Colomar Sat, 18 Oct 2025 05:34:47 -0700

Hi Ingo,

On Thu, Sep 25, 2025 at 02:02:24AM +0200, Ingo Schwarze wrote:
> > alx@debian:~$ mman false true | cat
> > mman: outdated mandoc.db lacks false(1) entry, run makewhatis /usr/share/man
> > mman: outdated mandoc.db lacks true(1) entry, run makewhatis /usr/share/man
> > FALSE(1)                       User Commands                      FALSE(1)
> > NAME
> >       false - do nothing, unsuccessfully
> [...]
> > GNU coreutils 9.7                June 2025                        FALSE(1)
> > 
> > --------------------------------------------------------------------------
> > 
> > ()                                                                      ()
> > 
> > ?�????????�TÑnÓ0?}ÏW [...]
> 
> Ouch.  I'm able to reproduce that bug on OpenBSD-current.  This must be
> the umpteenth time that something is broken with compressed manual pages -
> i keep saying that compressing manual pages is pointless in the 21st
> century, not only because the space savings are negligible compared
> to the size of modern function libraries and programs, but also because
> it adds complexity and hence fragility.


Oh, for some reason my brain didn't connect the dots.  I didn't imagine
it would be due to a compressed page.  I'm so used to install the manual
pages uncompressed, that I sometimes forget that the system pages are
compressed.

>  I freely admit this bug was
> my fault, but all the same, triggering it was a consequence of
> compressing manual pages.

I agree compressed manual pages make no sense.  Storage is chap these
days.  And it requires using specialized tools.

        alx@debian:~$ du -h /usr/share/man/ | tail -n1
        57M     /usr/share/man/
        alx@debian:~$ find /usr/share/man/ -type f | wc -l
        10133

Those 10k manual pages, if uncompressed, would only take around 150M,
I suspect.  Oh, I can actually measure:

        $ find /usr/share/man/ -type f | xargs zcat | wc
        2972468 14291606 98751029

98M compared to 57M isn't even 2x.  The needed complexity isn't worth
it.

I'll remember to show these results to the appropriate Debian
maintainers asking for manual pages to not be compressed anymore.

> I have committed the bugfix here (rev. 1.364):
> 
>   https://cvsweb.bsd.lv/mandoc/main.c
> 
> Thanks for the report!

It's a pleasure to contribute to mandoc(1)!  :-)

> That said, i really need to roll the next mandoc release,
> to get all the bug fixes out to users.
> Around November 2025 would probably be an ideal time.

Thanks!

BTW, some reminder: please check the groff-1.24 feature that allows
using a second token in .SY, which allows using it in function
prototypes.

[...]
> > In general, catenating stuff is trivial, but undoing that operation
> > is not.
> 
> Indeed, that is one of the many problems with catenating manual page
> sources before formatting them.  Many manual pages, in particular
> autogenerated man(7) pages, have a header of low-level roff(7)
> instructions preceding the .TH macro, so finding the beginning of
> the next manual page is not quite as easy as finding the next .TH
> macro.  In particular, it would be an extremely bad idea to let the .TH
> macro reset *any* parser state because that would break many
> autogenerated man(7) pages - you could argue that putting low-level
> roff(7) into a manual page is evil in the first place, but just
> wiping it out an one particular, essentially random place in the
> middle of the manual page, i.e. at the .TH macro, is still quite
> harsh a punishment.

Oh, yep.  I sometimes forget that most manual pages out there are not
even close to the quality of the ones I maintain.  :)

[...]
> On the other hand, for mdoc(7), the situation is much worse than
> for man(7) in so far as the macro order .Dd .Dt .Os used to be
> mere convention, and any other order of these three macros used
> to be equally valid.  Groff-1.23 utterly broke that and now always
> starts a new manual page at .Dd, so every manual page with a different
> macro order is now totally broken with groff.

Hmmm.

> [...]
> >> You could simply add FD_CLOEXEC as a name to the NAME section that you
> >> consider canonical for defining FD_CLOEXEC, such that users can simply
> >> type "man FD_CLOEXEC".  We don't to that in OpenBSD because when
> >> semantic search is available, "man FD_CLOEXEC" provides little benefit
> >> over "man -ak Dv=FD_CLOEXEC" or "man -ak any=FD_CLOEXEC", so just
> >> as you consider additional links in the file system excessively noisy,
> >> we consider even (less noisy) additional name section entries too noisy.
> >> Don't forget that defined constants are significantly more numerous
> >> in some APIs than function names, so there is a real danger to cause
> >> readers to miss the forest among all the additional trees.
> 
> > Yup; that's what has stopped me from doing that in the past, and I still
> > don't think I'll do that.  I prefer leaving it up to a trivial Unix
> > pipe searching within /usr/share/man (for non-trivial needs), or man -K
> > for trivial needs.
> > 
> > This is quite easy:
> > 
> >     alx@debian:~$ man -awK FD_CLOEXEC
> >     /usr/local/man/man3/popen.3
> >     /usr/local/man/man3/posix_spawn.3
> [...]
> >     /usr/share/man/man7/systemd.directives.7.gz
> >     /usr/share/man/man7/fcntl.h.7posix.gz
> > 
> > And when I need more complex stuff, I can do just anything with pipes.
> > It requires knowing where the source code is located, but people with
> > those needs will most likely know where the manual pages are installed,
> > and that they might be compressed, so I'm not too worried.
> 
> Glad to hear that.  I use grep(1) -R as a last resort, too, but even
> though just like you, i'm probably a manual page power user to a very
> unsusual degree, using man(1) dozens of times every day, sometimes
> possibly hundreds of times, i need grep(1) -R over manuals very rarely,
> probably about once every few weeks or months.

One could say once every few weeks or months is relatively often.  :)

My needs are in the same order of magnitude, BTW.

> >>> My idea is having a proc(7) page that would essentially be built as:
> >>>   $ find man5/ | grep proc | sort | sed 's/^/.so /' > man7/proc.7;
> 
> >> I'd very strongly advise against that, for more than one reason.
> >> Neither of the two manual page formats is well-suited for
> >> concatenating input files and formatting them in a single run
> >> of the formatter.  Doing that tends to cause lots of unexpected
> >> and hard to diagnose issues.  Instead, such a job should be done
> >> by man(1): let the formatter format each page individually, then
> >> concatenate the results, *never* concatenate the source code.
> 
> > I find recent groff(1) being quite able to handle multi-.TH pages
> 
> Branden has invested massive effort into making it kind-of work,
> in fact so massive that i have totally lost track of what is going on.
> 
> If i remember correctly, he has invented lots of new registers
> along with lots of novel rules how to use them to make it work,
> wrapping himself into elaborate nets of overengineering and
> resulting in long discussions in various bug tracker tickets
> about how it is all supposed to work.  I refrained from reading
> most of that - too hard to understand and not really relevant for
> any practical purpose that i care about.
> 
> > I am going to agree to not do this for users, but I do this often for
> > myself.  I often want to see all the SYNOPSYS or STANDARDS (or whatever)
> > sections of *all* manual pages under man2 and man3,
> 
> Actually, for SYNOPSYS, there is a dedicated option -h:
> 
>    $ man -h -s 3 -k . | less

Oh!  That doesn't exist in man-db's man(1).  It's interesting, as the
SYNOPSIS is the most useful one for doing this.

BTW, why does -h imply -a?  I normally want to use it together with -a,
but it would be useful also without it in some cases.
(Same question about -c, but I'm less worried about that, since I can
 reverse the effects by piping to ul(1) and less(1) -R.)

> For STANDARDS, i typically run
> 
>    $ man -s 3 -ak .

Maybe you could accept an optional argument to -h that specifies the
name of the section.  I don't know if it's worth enough for normal
users, though.

> 
> and then type
> 
>   /^STANDARDS
> 
> and repeatedly press n and N as needed, one advantage being that when
> needed, i can look at the surrounding text with no hassle.

In general, I use this when trying to make the text more uniform across
all the pages, so I don't care too much about other sections.

> > and what I do is
> > cat(1) them together, extract the right sections (plus the TH lines)
> > with sed(1) (actually, I first do this, then catenate), and then pipe to
> > 'man /dev/stdin'.  It works quite nicely (with recent groff(1)).
> 
> Sure, that would likely work with mandoc, too, but seems to imply
> more work than is really needed unless i'm missing the point.

TBH, it's not really needed.  I can perfectly do it with

        $ export MAN_KEEP_FORMATTING=1;
        $ find -type f \
        | xargs grep -l TH \
        | while read -r f; do
                mansect STANDARDS $f \
                | man -Pcat /dev/stdin;
        done \
        | less -R;

So, I could agree with you that it's unnecessary to add this to
groff(1), at least for these purposes.  For generation of the PDF book,
I guess it's more necessary to support multiple TH lines.

> >> Also, this would result in massive multiplication of installed
> >> text (wasting space)
> 
> > .so pages don't duplicate text, do they?  Or you mean in indices?
> 
> Uuh, sorry, i was too inattentive and misread your line essentially as
> 
>    $ find man5/ | grep proc | sort | xargs cat > man7/proc.7
> 
> Using .so feels even worse due to the notorious fragility of .so.
> Then again, since you are doing this within a single manpath and only
> after chdir(2)ing to the best directory available for the purpose,
> maybe the worst of the fragility won't bite here, but who knows.
> 
> While .so can be useful for general typesetting needs, it is best
> avoided when doing anything with manual pages.
> 
> No, i wouldn't be too worried about indexes.  Even a full semantic
> search index is quite small compared to the pages themselves,
> and that's by design because we want searches to be fast and
> we don't want the mandoc.db to block too much of the buffer cache:
> 
>    $ du -sh /usr/share/man /usr/share/man/mandoc.db
>   44.9M   /usr/share/man
>   2.4M    /usr/share/man/mandoc.db
> 
> A non-semantic seach index is even smaller, though surprisingly
> enough, not by all that much:
> 
>    $ lsb_release -d
>   No LSB modules are available.
>   Description:    Debian GNU/Linux 12 (bookworm)
>    $ dpkg-query -s man-db | grep -e Status: -e Version:
>   Status: install ok installed
>   Version: 2.11.2-2
>    $ du -sh /usr/share/man /var/cache/man/index.*
>   41M     /usr/share/man
>   1.1M    /var/cache/man/index.db
> 
> It appear Oracle Solaris has switched its apropos(1) to support
> indexed full text search:
> 
>   schwarze@unstable11s [unstable11s]:~ > uname -a
>   SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
>    > du -sh /usr/share/man /usr/share/man/man-index
>   90M   /usr/share/man
>   30M   /usr/share/man/man-index
> 
> At first, i didn't feel sure whether that's a particularly wise choice
> considering the massive database size...  But it turns out search
> times aren't all that bad:
> 
>   schwarze@unstable11s [unstable11s]:~ > time man -K editor | wc
>      850    5377   41859
>   real    0m0.128s
>   user    0m0.081s
>   sys     0m0.048s
>    # that's with a 2008-era SPARC64 VII quad-core processor,
>    # each of the 4 cores capable of running two threads in parallel
> 
> And the output really includes stuff like this, among other things:
> 
>   36. bashbug(1)  DESCRIPTION  /usr/man/man1/bashbug.1
>   attempts to locate a number of alternative editors, including
> 
>   37. libgconf-2(3)  SEE ALSO  /usr/man/man3/libgconf-2.3
>   gconf-editor(1),
> 
>   180. git-pull(1)  OPTIONS  /usr/man/man1/git-pull.1
>   Invoke an editor before committing successful mechanical merge to further
>   edit the auto-generated merge message, so that the user can explain and
>   justify the merge\&. The
> 
>   191. c++(1)  OPTIONS  /usr/man/man1/c++.1
>   about any unresolved references (unless overridden by the link editor

Nice!


Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

signature.asc
Description: PGP signature

Re: mini-book manual pages through multi-.so pages (i.e., the old proc(5) page)

Reply via email to