On Monday 01 January 2007 19:52, Eric S. Raymond wrote: > Here is a slightly expanded version of a diagram I posted back towards > the beginning of the discussion: > [...] > > The box in the middle is intended to indicate the use of DocBook as a > common interchange format.
I may have, on occasions, over imbibed on seasonal refreshments in recent days so I thought I'd try and set down the knub of this discussion. Technical documentation has 3 elements - content (the actual words written) - structure (gives context to the content) - style (controls the presentation of the structure and content). 1. It would be desirable to be able to browse/navigate/search *nix technical documentation in a consistent manner - HTML/Browser posited as solution. (Dealing just with 'man' pages now) 2. Currently man pages generally use the -man macros, although there are no restrictions in using any *roff command/escape. 3. The 'man' page author intends to present technical information in the way he thinks it will be easiest for the audience to absorb, i.e. he will be more interested in presentation and content than structure. This tends to be counter to the aims of (1), since a common structure is required to add navigational tags and intelligent searching. 4. Fortunately, in the real world, most man page authors have used standard -man macros so some structural information can be derived from this. By using AI techniques further structure can be deduced. 5. This structure and content of a man page could be captured in XML-Docbook by a program called 'doclifter'. 6. Using just content and structure 'clean' HTML could be produced relying on a standard CSS to control presentation. This would mean that presentation would not be preserved from the man page, but would put this under control of the user (should high contrast colours, larger fonts, different text to speech voices to differentiate structure elements, etc. be required). 7. Since HTML output would not preserve the original presentation, it would be desirable to offer the user a way of viewing the man page as it was originally intended (groffer??!). The user can then choose whether to print using the browser formatted output, or groff formatted output. Point 6 above may be "new", in that it appears 'doclifter' is attempting to derive presentation information from the troff source (as well as content and structure). I would argue this is unnecessary, it would be more desirable to completely divorce content from presentation, storing content and structure in the XML and relying on a CSS to control presentation. If 'doclifter' solely concentrates on extracting content and structure it may be considerably simplified (to extract content 'nroff' is your friend ;-)). Mapping troff source (to extract structure info) to an nroff image of the page, may be easier than trying to track all groff commands and escapes. Just my tuppence. Cheers Deri _______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff