On Sun, 18 Feb 2024 17:14:37 -0600 "G. Branden Robinson" <g.branden.robin...@gmail.com> wrote:
> > For example, would a man(7) parser need to recognize groff(7) > > requests? > > Ingo Schwarze is well placed to answer that, Looking at it more closely, perhaps I see what concerns you. It's not parsing the man page. It's defining and interpreting the macros. Macro packages differ from preprocessors because they have ... macros. A preprocessor like pic doesn't interpret troff macros; it reads its own input language and emits troff as output. A macro package like man is just a bunch of predefined macros that troff itself expands. In the general case, to write a man preprocessor is to implement just enough of groff to rummage around in the macro definitions, interpret them, and interpolate them into the output. Looking at /usr/share/groff/1.22.4/tmac/an-old.tmac, for example, the simple LP macro is defined as > .de1 LP > . sp \\n[PD]u > . ps \\n[PS]u > . vs \\n[VS]u > . ft R > . in \\n[an-margin]u > . nr an-prevailing-indent \\n[IN] > . ns > .. To render LP as groff does, a preprocessor would have to parse several groff(7) requests, among them de, de1, and als, not to mention do and mso. It would have to implement copy mode and perhaps honor compatibility mode. OTOH, the general case isn't strictly necessary because man macros are very stable. We could just hard-code the macro text into the preprocessor and call it a day. Every time the parser reads .LP it just spits out .sp \n[PD]u .ps \n[PS]u .vs \n[VS]u .ft R .in \n[an-margin]u .nr an-prevailing-indent \n[IN] .ns and Job done. Which tack to take depends on the intended use. Perfect fidelity to what groff itself does requires, well, doing what groff does with the macro definitions. In principle, such a preprocessor would not be restricted to man, but would be a generalized macro preprocessor. Equally good fidelity might be accomplished, though, without access to the macro files, at the cost of keeping the preprocessor synchronized with them. Just how easy/hard that would be would rest on the extent to which the preprocessor had to *interpret* the macro definitions, rather than just act as a giant search-and-replace engine. I got drawn into this thread because I saw what I thought was a shoddy theoretical argument: that parsing macros was hard because LALR(1) or something. I don't think its at all difficult to lex a man page and produce distinct tokens for each man macro and its arguments. Any difficulty would lie in what exactly the parser would produce (and how), on encountering those tokens. --jkl