Werner LEMBERG <[EMAIL PROTECTED]>: > > Who is the person currently responsible for groff? Is it still you? > > Yep. Awaiting your commands.
So I see from the groff project page, which I should have checked first. Copying to Ted Harding and the groff list, which I just subscribed to. I want to drastically simplify the markup used in several pieces of groff documentation, eliminating a lot of the hairy custom macros they presently use. groffer.1 groff_out.5 groff_tmac.5 groff.7.gz groff_char.7 groff_mdoc.7 groff_trace.7 Technically this won't be hard; I could make the required changes in a few hours. But I hear you asking "Why fix what ain't broken?". The immediate technical answer is "the macro hackery is getting in the way of lossless translation to a Web-ready format". The more extended answer raises some philosophical issues about groff's place in the world. Extended rant follows... You may (or may not) be aware that one of the baskground tasks I've been pursuing for years is an effort to clean up the mess that is Unix documentation formats. It's the 21st century, all the documentation on my system ought to present as a hypertexted local Web through my browser. But a "big bang" solution -- everybody rewriting their stuff in HTML or whatever -- can't be imposed, if for no other reason than that the coordination problem is too hard. That means that in order for us to get to hypertext Nirvana, there have to be lossless (or near-to-lossless) translation paths from every legacy format to HTML. If we have that, then we can subsume everything and allow the legacy formats to die quietly (or survive as composition markups than nobody actually delivers in). The hardest format to webify in the Unix world is also the most important one -- man pages. (By way of GNUish contrast, TeXinfo is much easier.) There are a large number of tools that attempt this out there. In general, they do a crappy job. Five years ago I decided to solve this problem. And I did. I wrote a program called 'doclifter' that takes man-page sources in one end and emits XML-Docbook out the other. XML-Docbook to HTML is, of course, easy. (But man-to-DocBook *wasn't* easy. Doclifter is nearly 8000 lines of Python embodying both more parsing technology than many compilers for general-purpose programming languages and an entire rule-based production system. I have seen master's-thesis projects in AI with less AI in them than doclifter. No joke.) Why go through DocBook? Because, it turns out, the way to *not* do a crappy job of translation is to do structural analysis on the markup. DocBook carries the structural information needed to do stylesheet-based HTML generation at *much* higher quality than (say) latex2html ever manages with its purely presentation-level approach. In the five years since I wrote doclifter, I've been using it to do periodic audits of the man-page corpus, or at least as much of it as is represented by a full-boat Red-Hat/Fedora-Core installation. In FC6 this is over 13,000 man pages. The purpose of these audits is twofold: (1) Improve doclifter's performance (its clean-translation rate is now 96%). (2) Feed fix patches back to man-page maintainers to clean up broken markup (I've had nearly 300 patches accepted). The end goal is to be able to announce that transitioning away from man pages to HTML is a *solved problem*. When I get the look-ma-no-hands rate below 1%, I figure we can declare victory and go to the next phase. Clue about the next phase: last year I got a change into the man(1) sources is that tells it what to do when it finds an HTML source where it's expecting a man page, e.g hand off to a browser. The technical preconditions are nearly in place to kill off man pages as a presentation format. Think about that :-) After five years of effort, I am down to fewer than 4% translation failures. I'm to the point where pushing individual man-page cleanups to individual projects is actually more efficient than crocking doclifter to handle yet another weird edge case. To give you an idea of the numbers, my last full test was on 13,466 man pages. Of these, 391 (2%) require fix patches. I expect about half of these fix patches will be applied upstream within the next 90 days; others will take longer, depending on project release intervals. There remains a tiny hard core of 47 pages (0.3%) that can't be fix-patched. They remain unliftable. Of these, 25 are from netpbm and 7 (0.05%) are from groff. Thus, groff is my second largest source of man pages that can't be lifted to DocBook. The largest is netpbm, and I'm working with its maintainer to fix that now. So this is the answer to "why fix it?". Because the groff pages presently do elaborate, bizarre things that doclifter can't cope with. In this they are *unique*. I mean *unique*. Everywhere else the problem is almost entirely broken markup, not things people did deliberately. I want to fix the groff documentation so that it's no longer in the way of automatic lifting of *everything* to HTML. (As a side benefit, the markup in the groff documentation will become easier to maintain.) The only downside might be a slight decrease in the visual quality of the printed versions -- in particular, command synopses might no longer look quite as pretty. The philosophical issue this raises about groff's place in the world is simple: are we willing to accept that it's a legacy rather than a primary format? I don't ask this question dismissively. I probably grok *roff hackery as well as anybody who isn't Brian Kernighan -- groff carries two tools I wrote (pic2graph and eqn2graph) and I wrote your guide to pic. I think man macros will still have a place as a composition format, even if nobody presents from them any more. But I think it's time to move on. This little change will help us get to a fully-hypertexted, Web-centric documentation corpus. Let's do it. (And brace yourselves for the *real* political bunfight, which is when I try to kill off GNU info...) -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff