Larry Kollar <[EMAIL PROTECTED]>: > And, more to the point, why bother converting > this entire body of documentation to DocBook if there's already > a good way to convert it to cross-linked HTML? Wasn't that the > whole point of this exercise anyway?
I tried to cover this earlier. The cliche analysis you do for the intermediate step to DocBook turns out to be valuable -- you end up producing better HTML out the back end. Here's an example. One of the things DocLifter can do is recognize C code in a .nf/.fi block. Because it does that, it can generate <programlisting> rather than <literallayout> DocBook tags. The HTML-generating stylesheet can then either (a) generate <listing> tags or tag the <pre> with a class you can decorate with a stylesheet. All other man-to-HTML converters I know of take the easy way out and do strictly presentation-level mapping, so you get a <pre> with no semantic tagging and can't style code listings. Here's another one: doclifter recognizes that .B with an argument beginning with / in a section named FILES means the bold should be replaced with a <filename> rather than simple <emphasis>. Thus, you can style filenames differently from text emphasis in the target HTML. You can't do that with a presentation-level converter. Yet another one. All known Unix error-macro names get wrapped with DocBook <errorcode>. Result? You guessed it -- you can style error-macro names in target HTML differently than (say) environment variables. The HTML doclifter generates "knows" things about its source document that a presentation-level translation doesn't. Down the road this has implications for doing semantic-web stuff, intelligent document searching. In theory, man-to-HTML converters could do pattern-based cliche analysis themselves. But the smartest ones other than doclifter have maybe half-a-dozen rules, whereas doclifter has around two hundred (maybe more now, I lost count a few years back). Now we come to the interesting part, The reason doclifter has 200 rules is because I write it to translate man pages *into the DocBook ontology*. In effect, I piggybacked on the work the XML crowd has done in discovering the "natural" parts of technical documents. This is a much richer ontology than HTML, so man -> DocBook -> HTML produces best possible HTML. man -> HTML, on the other hand, ends up translating man pages into a sort of least-common-denominator ontology between man and HTML. You end up with really stupid, thin translations that throw away a lot of even the modest semantic information that man-page markup carries. > Another possibility is that groff itself can do the conversion. > The HTML "driver" has a ways to go yet before being able to > produce beautiful HTML, but what comes out now is close enough > to clean up using some awk scripts and HTML Tidy. So it might > be easier, short- and long-term, to encourage people to add the > -mwww macros to their man pages. The HTML driver has the same problem man2html and its ilk do. No semantic analysis, ergo really stupid and thin translations. I'm not just saying we can do better than that, I'm saying we already have. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff