Re: [Groff] Re: Simplifying groff documentation

Eric S. Raymond Sat, 23 Dec 2006 22:35:20 -0800

Larry Kollar <[EMAIL PROTECTED]>:
>              And, more to the point, why bother converting
> this entire body of documentation to DocBook if there's already
> a good way to convert it to cross-linked HTML? Wasn't that the
> whole point of this exercise anyway?


I tried to cover this earlier.  The cliche analysis you do for the
intermediate step to DocBook turns out to be valuable -- you end up
producing better HTML out the back end.

Here's an example.  One of the things DocLifter can do is recognize 
C code in a .nf/.fi block.  Because it does that, it can generate 
<programlisting> rather than <literallayout> DocBook tags.  The 
HTML-generating stylesheet can then either (a) generate <listing> tags
or tag the <pre> with a class you can decorate with a stylesheet.

All other man-to-HTML converters I know of take the easy way out 
and do strictly presentation-level mapping, so you get a <pre>
with no semantic tagging and can't style code listings.

Here's another one: doclifter recognizes that .B with an argument
beginning with / in a section named FILES means the bold should
be replaced with a <filename> rather than simple <emphasis>.  
Thus, you can style filenames differently from text emphasis in the 
target HTML.  You can't do that with a presentation-level converter.

Yet another one.  All known Unix error-macro names get wrapped with
DocBook <errorcode>.  Result?  You guessed it -- you can style
error-macro names in target HTML differently than (say) environment
variables.  

The HTML doclifter generates "knows" things about its source document
that a presentation-level translation doesn't.  Down the road this has
implications for doing semantic-web stuff, intelligent document
searching.

In theory, man-to-HTML converters could do pattern-based cliche
analysis themselves.  But the smartest ones other than doclifter
have maybe half-a-dozen rules, whereas doclifter has around two hundred
(maybe more now, I lost count a few years back).

Now we come to the interesting part, The reason doclifter has 
200 rules is because I write it to translate man pages *into the
DocBook ontology*.  In effect, I piggybacked on the work the XML
crowd has done in discovering the "natural" parts of technical
documents.  

This is a much richer ontology than HTML, so man -> DocBook -> HTML
produces best possible HTML.  man -> HTML, on the other hand, ends up
translating man pages into a sort of least-common-denominator ontology
between man and HTML.  You end up with really stupid, thin
translations that throw away a lot of even the modest semantic
information that man-page markup carries.

> Another possibility is that groff itself can do the conversion.
> The HTML "driver" has a ways to go yet before being able to
> produce beautiful HTML, but what comes out now is close enough
> to clean up using some awk scripts and HTML Tidy. So it might
> be easier, short- and long-term, to encourage people to add the
> -mwww macros to their man pages.

The HTML driver has the same problem man2html and its ilk do.
No semantic analysis, ergo really stupid and thin translations.
I'm not just saying we can do better than that, I'm saying we
already have.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>


_______________________________________________
Groff mailing list
Groff@gnu.org
http://lists.gnu.org/mailman/listinfo/groff

Re: [Groff] Re: Simplifying groff documentation

Reply via email to