Hello Alejandro,

Alejandro Colomar wrote on Tue, Sep 23, 2025 at 07:19:15PM +0200:
> On Tue, Sep 23, 2025 at 06:48:51PM +0200, Ingo Schwarze wrote:

>> For example, with the mandoc implementation of man(1):
>> 
>>    $ man true false
>> 
>> shows the true(1) manual page, followed by a separating line:
>> -----------------------------------------------------------------------
>> followed by the false(1) manual page.  The entire output is shown in
>> a single less(1) instance as if it were a single output file, with no
>> need to type ":n" to get to the next manual page.  Consequently, it
>> is trivial to search for strings across the whole output, across
>> all pages, with just the less "/" command, or to perform semantic
>> searches across all pages with just the less ":t" command.

> I prefer your approach over that of man-db, at least per how you
> describe it.  I've never used mman(1) before with more than one page,
> and it seems to be broken at the moment in Debian Sid:
> 
> alx@debian:~$ mman false true | cat
> mman: outdated mandoc.db lacks false(1) entry, run makewhatis /usr/share/man
> mman: outdated mandoc.db lacks true(1) entry, run makewhatis /usr/share/man
> FALSE(1)                       User Commands                      FALSE(1)
> NAME
>       false - do nothing, unsuccessfully
[...]
> GNU coreutils 9.7                June 2025                        FALSE(1)
> 
> --------------------------------------------------------------------------
> 
> ()                                                                      ()
> 
> ?�????????�TÑnÓ0?}ÏW [...]

Ouch.  I'm able to reproduce that bug on OpenBSD-current.  This must be
the umpteenth time that something is broken with compressed manual pages -
i keep saying that compressing manual pages is pointless in the 21st
century, not only because the space savings are negligible compared
to the size of modern function libraries and programs, but also because
it adds complexity and hence fragility.  I freely admit this bug was
my fault, but all the same, triggering it was a consequence of
compressing manual pages.

I have committed the bugfix here (rev. 1.364):

  https://cvsweb.bsd.lv/mandoc/main.c

Thanks for the report!

That said, i really need to roll the next mandoc release,
to get all the bug fixes out to users.
Around November 2025 would probably be an ideal time.

[...]
> Another thing is when indenting stuff, there's so many levels of
> intendation (think of the old proc(5)), and each level might be hundreds
> of lines, that I have a really hard time tracking down where things
> start and end.

I agree.

> In general, catenating stuff is trivial, but undoing that operation
> is not.

Indeed, that is one of the many problems with catenating manual page
sources before formatting them.  Many manual pages, in particular
autogenerated man(7) pages, have a header of low-level roff(7)
instructions preceding the .TH macro, so finding the beginning of
the next manual page is not quite as easy as finding the next .TH
macro.  In particular, it would be an extremely bad idea to let the .TH
macro reset *any* parser state because that would break many
autogenerated man(7) pages - you could argue that putting low-level
roff(7) into a manual page is evil in the first place, but just
wiping it out an one particular, essentially random place in the
middle of the manual page, i.e. at the .TH macro, is still quite
harsh a punishment.

For mdoc(7), matters are not quite as bad in so far as mdoc(7)
autogenerators are virtually unheard of (well, pod2mdoc(1) exists,
but so far, that is only used for semi-automatic semi-manual
format conversions of old perlpod(1) manuals to new mdoc(7) manuals.
Nobody uses it (yet) for routinely rendering perlpod(1) manuals
with less(1) :t tagging support, though the idea certainly exists).
Also, low-level roff(7) preambles are far less common in mdoc(7)
than in man(7) pages.
On the other hand, for mdoc(7), the situation is much worse than
for man(7) in so far as the macro order .Dd .Dt .Os used to be
mere convention, and any other order of these three macros used
to be equally valid.  Groff-1.23 utterly broke that and now always
starts a new manual page at .Dd, so every manual page with a different
macro order is now totally broken with groff.

[...]
>> You could simply add FD_CLOEXEC as a name to the NAME section that you
>> consider canonical for defining FD_CLOEXEC, such that users can simply
>> type "man FD_CLOEXEC".  We don't to that in OpenBSD because when
>> semantic search is available, "man FD_CLOEXEC" provides little benefit
>> over "man -ak Dv=FD_CLOEXEC" or "man -ak any=FD_CLOEXEC", so just
>> as you consider additional links in the file system excessively noisy,
>> we consider even (less noisy) additional name section entries too noisy.
>> Don't forget that defined constants are significantly more numerous
>> in some APIs than function names, so there is a real danger to cause
>> readers to miss the forest among all the additional trees.

> Yup; that's what has stopped me from doing that in the past, and I still
> don't think I'll do that.  I prefer leaving it up to a trivial Unix
> pipe searching within /usr/share/man (for non-trivial needs), or man -K
> for trivial needs.
> 
> This is quite easy:
> 
>       alx@debian:~$ man -awK FD_CLOEXEC
>       /usr/local/man/man3/popen.3
>       /usr/local/man/man3/posix_spawn.3
[...]
>       /usr/share/man/man7/systemd.directives.7.gz
>       /usr/share/man/man7/fcntl.h.7posix.gz
> 
> And when I need more complex stuff, I can do just anything with pipes.
> It requires knowing where the source code is located, but people with
> those needs will most likely know where the manual pages are installed,
> and that they might be compressed, so I'm not too worried.

Glad to hear that.  I use grep(1) -R as a last resort, too, but even
though just like you, i'm probably a manual page power user to a very
unsusual degree, using man(1) dozens of times every day, sometimes
possibly hundreds of times, i need grep(1) -R over manuals very rarely,
probably about once every few weeks or months.

>>> My idea is having a proc(7) page that would essentially be built as:
>>>     $ find man5/ | grep proc | sort | sed 's/^/.so /' > man7/proc.7;

>> I'd very strongly advise against that, for more than one reason.
>> Neither of the two manual page formats is well-suited for
>> concatenating input files and formatting them in a single run
>> of the formatter.  Doing that tends to cause lots of unexpected
>> and hard to diagnose issues.  Instead, such a job should be done
>> by man(1): let the formatter format each page individually, then
>> concatenate the results, *never* concatenate the source code.

> I find recent groff(1) being quite able to handle multi-.TH pages

Branden has invested massive effort into making it kind-of work,
in fact so massive that i have totally lost track of what is going on.

If i remember correctly, he has invented lots of new registers
along with lots of novel rules how to use them to make it work,
wrapping himself into elaborate nets of overengineering and
resulting in long discussions in various bug tracker tickets
about how it is all supposed to work.  I refrained from reading
most of that - too hard to understand and not really relevant for
any practical purpose that i care about.

> I am going to agree to not do this for users, but I do this often for
> myself.  I often want to see all the SYNOPSYS or STANDARDS (or whatever)
> sections of *all* manual pages under man2 and man3,

Actually, for SYNOPSYS, there is a dedicated option -h:

   $ man -h -s 3 -k . | less

For STANDARDS, i typically run

   $ man -s 3 -ak .

and then type

  /^STANDARDS

and repeatedly press n and N as needed, one advantage being that when
needed, i can look at the surrounding text with no hassle.

> and what I do is
> cat(1) them together, extract the right sections (plus the TH lines)
> with sed(1) (actually, I first do this, then catenate), and then pipe to
> 'man /dev/stdin'.  It works quite nicely (with recent groff(1)).

Sure, that would likely work with mandoc, too, but seems to imply
more work than is really needed unless i'm missing the point.

>> Also, this would result in massive multiplication of installed
>> text (wasting space)

> .so pages don't duplicate text, do they?  Or you mean in indices?

Uuh, sorry, i was too inattentive and misread your line essentially as

   $ find man5/ | grep proc | sort | xargs cat > man7/proc.7

Using .so feels even worse due to the notorious fragility of .so.
Then again, since you are doing this within a single manpath and only
after chdir(2)ing to the best directory available for the purpose,
maybe the worst of the fragility won't bite here, but who knows.

While .so can be useful for general typesetting needs, it is best
avoided when doing anything with manual pages.

No, i wouldn't be too worried about indexes.  Even a full semantic
search index is quite small compared to the pages themselves,
and that's by design because we want searches to be fast and
we don't want the mandoc.db to block too much of the buffer cache:

   $ du -sh /usr/share/man /usr/share/man/mandoc.db
  44.9M   /usr/share/man
  2.4M    /usr/share/man/mandoc.db

A non-semantic seach index is even smaller, though surprisingly
enough, not by all that much:

   $ lsb_release -d
  No LSB modules are available.
  Description:    Debian GNU/Linux 12 (bookworm)
   $ dpkg-query -s man-db | grep -e Status: -e Version:
  Status: install ok installed
  Version: 2.11.2-2
   $ du -sh /usr/share/man /var/cache/man/index.*
  41M     /usr/share/man
  1.1M    /var/cache/man/index.db

It appear Oracle Solaris has switched its apropos(1) to support
indexed full text search:

  schwarze@unstable11s [unstable11s]:~ > uname -a
  SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
   > du -sh /usr/share/man /usr/share/man/man-index
  90M   /usr/share/man
  30M   /usr/share/man/man-index

At first, i didn't feel sure whether that's a particularly wise choice
considering the massive database size...  But it turns out search
times aren't all that bad:

  schwarze@unstable11s [unstable11s]:~ > time man -K editor | wc
     850    5377   41859
  real    0m0.128s
  user    0m0.081s
  sys     0m0.048s
   # that's with a 2008-era SPARC64 VII quad-core processor,
   # each of the 4 cores capable of running two threads in parallel

And the output really includes stuff like this, among other things:

  36. bashbug(1)  DESCRIPTION  /usr/man/man1/bashbug.1
  attempts to locate a number of alternative editors, including

  37. libgconf-2(3)  SEE ALSO  /usr/man/man3/libgconf-2.3
  gconf-editor(1),

  180. git-pull(1)  OPTIONS  /usr/man/man1/git-pull.1
  Invoke an editor before committing successful mechanical merge to further
  edit the auto-generated merge message, so that the user can explain and
  justify the merge\&. The

  191. c++(1)  OPTIONS  /usr/man/man1/c++.1
  about any unresolved references (unless overridden by the link editor

[...]
>>> What do you think?

>> I expect you will be very surpised in how many different ways such
>> a scheme will bite you if, God forbid, you ever try it for real.

> Nah, I've been convinced of not trying it.  Thanks!  :)

That's a relief.  I was already becoming afraid of fallout down
the road.  I mean, if you do something in the Linux Manual Pages
Project, due to the considerable importance of the project,
it often has wider effects far beyond the project itself.

Yours,
  Ingo

Reply via email to