Hi Larry,

At 2024-02-22T21:37:12-0500, Larry Kollar wrote:
> I’m a little late to the party, but I’ve read Alex’s original post
> over several times, and I have to wonder if everyone is over-thinking
> this.

Yes and no.

> > On Feb 16, 2024, at 10:21 AM, Alejandro Colomar <a...@kernel.org> wrote:
> > I've been thinking about a suggestion I've done in the past.  I
> > wanted a program that reads man(7) source and produces roff(7)
> > source, so that it can later be passed to troff(1), thus splitting
> > the groff(1) pipeline a bit more.  The idea is similar to how eqn(1)
> > and other pre-troff filters do their job.
> 
> There has to be a phase during which (g)troff interprets the macros
> and produces roff(7) to feed to the main processor.

Not really.  *roff macro interpolation is not like running the C
preprocessor; if it were, what you suggest would work fine.

But the C preprocessor's macro language is a pretty feeble demonstration
of a macro language.

For one thing, there are 3 sorts of things that undergo interpolation:
macros, registers, and strings.

Furthermore the identifiers of these can be constructed using
interpolations of others--and this is actually done in groff macro
packages.

Beyond that, macros can define other macros (groff ms(7) does this, for
instance).

Here's an example from s.tmac.

.\" par*define-font-macro macro font apply-italic-corrections
.de par*define-font-macro
.de \\$1
.ds par*lic \" empty
.ds par*ic \" empty
.if \\n[.$]>2 \{\
.       as par*lic \,\"
.       as par*ic \/\"
.\}
.if \En[.$]>3 .@warning excess arguments to .\\$1 ignored
.ie \En[.$] \{\
.       nr par*prev-font \En[.f]
\&\E$3\E*[par*lic]\f[\\$2]\E$1\f[\En[par*prev-font]]\E*[par*ic]\E$2
.\}
.el .ft \\$2
\\..
..
.par*define-font-macro R R
.par*define-font-macro B B
.par*define-font-macro I I yes
.par*define-font-macro BI BI yes
.ie n .par*define-font-macro CW R
.el   .par*define-font-macro CW CR

Also, macros are allowed to call themselves recursively and in fact this
is the traditional means of implementing a loop in AT&T troff.

(Do many man pages need loops?  No.  But see below.)

> Would it be possible to add a new command line option (like —roff)
> that simply dumps the input with macros applied, then stops?

No.  Not "simply".

It is true that man(7) documents tend to use *roff only in a pretty
basic way.  This is due to a combination of factors.

1.  Unfamiliarity with the formatter on the part of man page authors
    (the sorts of people who hate writing documentation _really_ hate
    _reading_ it);
2.  Authors of non-roff man page interpreters failing to support all
    *roff features a page might use (one can hardly blame them);
3.  groff's own documentation recommending only a limited, portable
    subset of man(7) and formatter features to avoid frustration on the
    part of page authors and readers.

The foregoing is something of a self-reinforcing cycle; the smaller the
language we prescribe as suitable for man(7) composition, the easier it
seems to be to do what you and Alex are asking for.

This is the road tools like doclifter(1) and mandoc(1) started down
years ago.  The problem, as ever, is an 80/20 rule, or 90/10, 95/5 one.
With only a little bit of effort you can knock together something that
seems startlingly capable.  With enough effort, you can handle a large
majority of a given corpus...but the remaining outliers prove more and
more difficult and demanding.[1]  You won't get to 100% without
implementing a fully armed and operational *roff, and a GNU
troff-compatible one at that.

GNU troff itself just isn't written this way, to do only one "level" of
interpolation and stop.  I'm reluctant to even look into seeing if it
can be stuck in, because my instinct is that too much would break.

Even in our man(7) package, we have macros calling other macros.
There's no flag bit on any of them indicating that they're "top-level"
macros.

Finally, unless your not-roff interpreter simulates the vertical drawing
position and updates it--which it would probably have to do via
something pretty close to operating the way a formatter does, with
simulated line-filling and breaking--you won't be able to reliably
troubleshoot page headers and footers with this technique.

Regards,
Branden

[1] A notable example is going to be any page that uses tbl(1).  If
    you've ever looked at what the tbl preprocessor emits, you know that
    you're going to need a lot of machinery to handle it.  This might
    even be the first hurdle at which initially promising ad hoc man(7)
    interpreters fall.

    You could of course interpret tbl(1) input for yourself...again, you
    will find yourself measuring text and formatting it.

    Or you could just skip anything that's in a table, which is fine
    until that's the part of the page that needs illumination...

Attachment: signature.asc
Description: PGP signature

Reply via email to