Dear Werner et al., >> This mail is just to raise awareness of an alternative to "groff >> -mandoc", mandoc (née mdocml) at <http://mdocml.bsd.lv>. >> Disclaimer: I'm the project's lead. mandoc is a BSD-licensed, C >> implementation satisfying ONLY the BSD "-mdoc" manual format, and to >> a limited extent, the traditional -man. The system, as-is, handles >> the majority of BSD manuals, and does so considerably faster than >> groff. > > If you restrict yourself to mdoc and man syntax this is a logical > consequence...
I do, and this constitutes the basis of my arguments. Note that some of my arguments, especially that of groff's uncertainty, are not specific to -mdoc/-man formatting. > However, you write > > [groff] runs slowly, produces uncertain output, and varies in > operation from system to system. > > Hmmm. Please give more details how you come to this conclusion. In terms of speed, groff loads tmac files (macros, character sets, hyphenations, etc.), reads and parses input into IF by way of prototypes (assuming -mdoc/man), sends IF to the output device, then renders the output. These all incur significant overhead. mandoc, by contrast, is a standalone executable: parser libraries (-mdoc, -man) linked statically to output libraries (-Tascii, etc.). The parsers are ad hoc and table-driven, governed by an ontology based on macro syntax. The "IF" is a well-formed, regular AST. Character sets and so on are hard-coded. This is equivalent to hard-coding the tmac structures and linking together all groff components into a single binary. Obviously, mandoc can only do this as a result of its specificity. By way of informal illustration, on a 2,5GHz machine, `nroff -mandoc' takes 1m46s (3-pass mean) to render all OpenBSD manuals (to /dev/null); mandoc takes 4s. In terms of uncertain output, mdoc(7) and mdoc.samples(7) -- not even to mention the melange of troff(1), groff(1), groff_char(7), etc. -- make for an irregular, fragmented reference. Consider: .Qq Hello, world. Hello again. Hello yet again. Notice the varied sentential spacing. Same goes with discarding whitespace (yes in some macros line, no in free-form). Consider also: if, say, `Qq' is used at the line border versus `"', will it hyphenate? Or any macro? How are line overruns handled in this case? Do the \*(xx escapes, or \(xx or \[xxx] or \*[xxx], produce equivalent output? What if one passes a title, volume, and architecture in `Dt'? What happens to text on a `Pp' line? These ambiguities motivate uncertainty. All of these have answers, but the lack of reference causes uncertainty. The manuals bundled with mandoc are re-writes (or still being re-written) of the above, with a specific eye toward compositional regularity. In terms of variegated output, OpenBSD and Linux, for example, render `Nd' macros with an En dash (the former, until recently, was just an escaped minus sign), while NetBSD uses an Em dash. The set of available macros is non-uniform (Lk? Mt?). The available special character set differs. The set of installed manuals differs. Some systems rendered `Pa' with an underline; some don't. Macro default widths vary widely (see `Er'). groff benefits from generality, where -mdoc and -man are only macro prototypes, and on the liberties of customisation. mandoc offers neither of these (well, limited liberty) and is thus able to operate much more efficiently and concisely within its specific domain. This all disregards my biggest problem with groff in the sense of -mdoc: given -Thtml, how can I embolden only variable types? Since the groff IF abstracts presentation, and necessarily so, this is impossible wlog. mandoc interprets -mdoc as a specific, semantic language, reflected in the parsers' ASTs. Hope this clears things up, Kristaps
