On Sun, 18 Feb 2024 17:14:37 -0600
"G. Branden Robinson" <g.branden.robin...@gmail.com> wrote:

> > For example, would a man(7) parser need to recognize groff(7)
> > requests?  
> 
> Ingo Schwarze is well placed to answer that, 

Looking at it more closely, perhaps I see what concerns you.  It's not
parsing the man page.  It's defining and interpreting the macros.  

Macro packages differ from preprocessors because they have ... macros.
A preprocessor like pic doesn't interpret troff macros; it reads its own
input language and emits troff as output.  A macro package like man is
just a bunch of predefined macros that troff itself expands.  In the
general case, to write a man preprocessor is to implement just enough of
groff to rummage around in the macro definitions, interpret them, and
interpolate them into the output.  

Looking at /usr/share/groff/1.22.4/tmac/an-old.tmac, for example, the
simple LP macro is defined as 

> .de1 LP
> .  sp \\n[PD]u
> .  ps \\n[PS]u
> .  vs \\n[VS]u
> .  ft R
> .  in \\n[an-margin]u
> .  nr an-prevailing-indent \\n[IN]
> .  ns
> ..

To render LP as groff does, a preprocessor would have to parse
several groff(7) requests, among them de, de1, and als, not to mention
do and mso.  It would have to implement copy mode and perhaps honor
compatibility mode. 

OTOH, the general case isn't strictly necessary because man macros are
very stable.  We could just hard-code the macro text into the
preprocessor and call it a day.  Every time the parser reads

    .LP

it just spits out 

    .sp \n[PD]u
    .ps \n[PS]u
    .vs \n[VS]u
    .ft R
    .in \n[an-margin]u
    .nr an-prevailing-indent \n[IN]
    .ns

and Job done.  

Which tack to take depends on the intended use.  Perfect fidelity to
what groff itself does requires, well, doing what groff does with the
macro definitions.  In principle, such a preprocessor would not be
restricted to man, but would be a generalized macro preprocessor.  

Equally good fidelity might be accomplished, though, without access to
the macro files, at the cost of keeping the preprocessor synchronized
with them.  Just how easy/hard that would be would rest on the extent
to which the preprocessor had to *interpret* the macro definitions,
rather than just act as a giant search-and-replace engine.  

I got drawn into this thread because I saw what I thought was a shoddy
theoretical argument: that parsing macros was hard because LALR(1) or
something.  I don't think its at all difficult to lex a man page and
produce distinct tokens for each man macro and its arguments.  Any
difficulty would lie in what exactly the parser would produce (and how),
on encountering those tokens.  

--jkl


Reply via email to