Hello alls,

Resuming my little sery of articles, I am explaining today how
Utmac is linked to the XML world.

# Troff and Xml
---------------

We all have in mind the various attempts to produce XML files
from a troff document: some aim to be universal, and, dealing
with the raw troff requests, can only ouptut non semantic html
with hardcoded styles, while others, dedicated to a particular
macro, fail to consider the raw troff requests the user may need
in his document (cf. the source of ms2html, in which the author
comments he is implementing more and more raw troff requests)

XML files are nothing else but plain text files with semantic
informations. On the other side, a troff document contains
structured information which gets its meaning within the context
of a macro. When we think at it, we have yet a tool which
interprets a troff source within the context of a macro to
produce plain text files: nroff.  Could we use nroff to produce
xml files ? I tried, and it appears that solution works well.

The idea is simple: one only has to write a macro file, which
interprets all the interface macros (paragraph, headers...), to
add XML tags to the output file. For example, here is a simple
macro to produce XML paragraphs and headings:

    .de PP
    .       \" first, we close the previous block
    .       \" by printing its recorded tag
    .       if d xml-block \\*[xml-block]
    .       \" Secondly, we define the closing tag for the block
    .       ds xml-block </p>
    .       \" and last, we print the openning tag.
    <p>
    ..
    .de H1
    .       if d xml-block \\*[xml-block]
    .       rm xml-block
    <h1>\\$*</h1>
    ..


Nroff has to be configured to produce a correct xml files: we do
not want hyphen, lines don’t need to be adjusted, and, the page
length has to be defined correctly.

    .\" page length is one line
    .pl 1v
    .ll 75
    .\" don’t adjust nor hyphenates
    .na
    .nh
    .\" Ending macro is doc:end
    .em doc:end
    .\" Print header
    < ?xml version="1.0" encoding="UTF-8" ?>
    .\" Open the root tag
    <utmac>
    .de doc:end
    .       \" doc:end needs some more space to output text
    .       pl \\n(nlu+3v
    .       \" close the previous block
    .       if d xml-block \\*[xml-block]
    .       \" Close the root tag.
    </utmac>
    .       \" set correct page length
    .       pl \\n(nlu
    ..


Since the fonts are hierarchical and defined as strings in Utmac,
they are easy to implement as well.

    .ds font-bold0 </B>
    .ds font-bold1 <B>
    .nr f-b 0
    .ds B \ER’f-b 1-\En[f-b]’\E*[font-bold\En[f-b]]


The only real problem of using nroff to produce xml documents is
that — along with troff — it is not easy to deal with
automatically inserted spaces. I tried to use .chop and \c, but
without reliable results. To solve that problem and escape the
possible restricted characters a user may insert in his document
(’<’, ’>’, and ’&’), I wrote a small post-processor – postxml –,
which translates a custom set of tags to xml special characters.
Amongst those tags, a special tag removes newlines:

    #[ becomes <
    #] becomes >
    #( becomes &
    #) becomes ;
    \n#-\n is deleted from the stream, and is used to delete newlines.


So, instead of directly writing xml tags, the nroff macro
produces writes those custom tags, which are later translated by
postxml. Our paragraph macro becomes:

    .de PP
    .       if d xml-block \{\
    .       \" tag to remove unwanted newlines
    #-
    .       \" closing xml tag
    \\*[xml-block]
    .       \}
    .       ds xml-block #[/pp#]
    .       \" opening xml tag
    #[pp]#
    .       \" tag to remove unwanted newlines
    #-
    ..


A preprocessor, prexml, escapes the possible presence of those
tags in the user document. The troffxml archive, avaible on
(http://utroff.org) provides prexml, postxml, and a two xsl
stylesheet to produce html and fodt (flat open document) files,
and Utmac provides the macro ux for that purpose. So, the command
to produce xml documents from a troff source is:

    prexml < f.tr | nroff -Tlocale -mux | postxml > f.xml
    xsltproc utohtml.xsl f.xml > f.html
    xsltproc utofodt.xsl f.xml > f.fodt


Since I believe you want to have a look at the result, you will
find, joined to this mail, its xml, html, and fodt versions as
produced by this system (which reveals the fodt code block needs
some more work...).

On my next mail about Utmac, I will present you some goodies.

Kind Regards,
Pierre-Jean

Attachment: xml.xml
Description: XML document

<?xml version="1.0" encoding="UTF-8"?>
<office:document 
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" 
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" 
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" 
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" 
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" 
xmlns:xlink="http://www.w3.org/1999/xlink"; 
xmlns:dc="http://purl.org/dc/elements/1.1/"; 
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" 
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" 
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" 
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" 
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" 
xmlns:math="http://www.w3.org/1998/Math/MathML"; 
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" 
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" 
xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" 
xmlns:ooo="http://openoffice.org/2004/office"; 
xmlns:ooow="http://openoffice.org/2004/writer"; 
xmlns:oooc="http://openoffice.org/2004/calc"; 
xmlns:dom="http://www.w3.org/2001/xml-events"; 
xmlns:xforms="http://www.w3.org/2002/xforms"; 
xmlns:xsd="http://www.w3.org/2001/XMLSchema"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:rpt="http://openoffice.org/2005/report"; 
xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" 
xmlns:xhtml="http://www.w3.org/1999/xhtml"; 
xmlns:grddl="http://www.w3.org/2003/g/data-view#"; 
xmlns:tableooo="http://openoffice.org/2009/table"; 
xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" 
xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0"
 xmlns:css3t="http://www.w3.org/TR/css3-text/"; office:version="1.2" 
office:mimetype="application/vnd.oasis.opendocument.text">
  <office:meta>
    <dc:title/>
    <meta:creation-date>2017-12-02T15:39:54</meta:creation-date>
    <meta:generator>heirloom nroff -mux</meta:generator>
    <dc:description>Author: </dc:description>
    <meta:keyword/>
    <dc:subject/>
    <dc:title/>
  </office:meta>
  <office:font-face-decls>
    <style:font-face style:name="Linux Libertine" svg:font-family="'Linux 
Libertine'" style:font-family-generic="roman" style:font-pitch="variable"/>
    <style:font-face style:name="Linux Libertine G:smcp=1&amp;lnum=1" 
svg:font-family="'Linux Libertine G:smcp=1'" style:font-family-generic="roman" 
style:font-adornments="Normal" style:font-pitch="variable"/>
    <style:font-face style:name="Linux Libertine 
G:smcp=1&amp;lnum=1&amp;c2sc=1" svg:font-family="'Linux Libertine 
G:smcp=1&amp;lnum=1&amp;c2sc=1'" style:font-family-generic="roman" 
style:font-adornments="Normal" style:font-pitch="variable"/>
    <style:font-face style:name="Linux Libertine G:sups=1" 
svg:font-family="'Linux Libertine G:sups=1'" style:font-family-generic="roman" 
style:font-adornments="Normal" style:font-pitch="variable"/>
    <style:font-face style:name="Linux Libertine G:sinf=1" 
svg:font-family="'Linux Libertine G:sinf=1'" style:font-family-generic="roman" 
style:font-adornments="Normal" style:font-pitch="variable"/>
    <style:font-face style:name="Linux Libertine" svg:font-family="'Linux 
Libertine'" style:font-family-generic="roman" style:font-adornments="Normal" 
style:font-pitch="variable"/>
    <style:font-face style:name="Linux Libertine Mono" svg:font-family="'Linux 
Libertine Mono'" style:font-family-generic="roman" 
style:font-adornments="Normal" style:font-pitch="variable"/>
  </office:font-face-decls>
  <office:styles>
    <style:default-style style:family="paragraph">
      <style:paragraph-properties fo:hyphenation-ladder-count="no-limit" 
style:text-autospace="ideograph-alpha" style:punctuation-wrap="hanging" 
style:line-eak="strict" style:tab-stop-distance="1.251cm" 
style:writing-mode="page"/>
      <style:text-properties style:use-window-font-color="true" 
style:font-name="Linux Libertine" fo:font-size="12pt" fo:language="fr" 
fo:country="FR" style:letter-kerning="true" fo:hyphenate="false" 
fo:hyphenation-remain-char-count="2" fo:hyphenation-push-char-count="2"/>
    </style:default-style>
    <style:style style:name="Standard" style:family="paragraph" 
style:class="text"/>
    <style:style style:name="Heading" style:family="paragraph" 
style:parent-style-name="Standard" style:next-style-name="Text_body" 
style:class="text">
      <style:paragraph-properties fo:margin-top="0.423cm" 
fo:margin-bottom="0.212cm" fo:keep-with-next="always"/>
      <style:text-properties fo:font-size="14pt"/>
    </style:style>
    <style:style style:name="Heading_1" style:display-name="Heading 1" 
style:family="paragraph" style:parent-style-name="Heading" 
style:next-style-name="Text_body" style:default-outline-level="1" 
style:class="text">
      <style:paragraph-properties fo:text-align="center" 
style:justify-single-word="false"/>
      <style:text-properties fo:font-size="20pt" fo:font-style="italic" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="Heading_2" style:display-name="Heading 2" 
style:family="paragraph" style:parent-style-name="Heading" 
style:next-style-name="Text_body" style:default-outline-level="2" 
style:class="text">
      <style:paragraph-properties fo:text-align="center" 
style:justify-single-word="false"/>
      <style:text-properties fo:font-size="16pt" fo:font-style="italic" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="Heading_3" style:display-name="Heading 3" 
style:family="paragraph" style:parent-style-name="Heading" 
style:next-style-name="Text_body" style:default-outline-level="3" 
style:class="text">
      <style:text-properties fo:font-size="14pt" fo:font-style="italic" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="Heading_4" style:display-name="Heading 4" 
style:family="paragraph" style:parent-style-name="Heading" 
style:next-style-name="Text_body" style:default-outline-level="4" 
style:class="text">
      <style:paragraph-properties fo:margin-top="0.10cm" 
fo:margin-bottom="0.10cm"/>
      <style:text-properties fo:font-size="12pt" fo:font-style="italic" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="Text_body" style:display-name="Text body" 
style:family="paragraph" style:parent-style-name="Standard" style:class="text">
      <style:paragraph-properties fo:margin-top="0.10cm" 
fo:text-align="justify" fo:text-indent="0.5cm" fo:margin-bottom="0.10cm"/>
    </style:style>
    <style:style style:name="Text_quotation" style:display-name="Text 
quotation" style:family="paragraph" style:parent-style-name="Text_body" 
style:class="text">
      <style:paragraph-properties fo:margin="100%" fo:margin-left="2cm" 
fo:margin-right="0cm" fo:text-indent="0cm" style:auto-text-indent="false"/>
    </style:style>
    <style:style style:name="Text_left_align" style:display-name="Text left 
align" style:family="paragraph" style:parent-style-name="Text_body" 
style:class="text">
      <style:paragraph-properties fo:text-align="left" fo:margin="100%" 
fo:margin-left="2cm" fo:margin-right="0cm" fo:text-indent="0cm" 
style:auto-text-indent="false"/>
    </style:style>
    <style:style style:name="Text_right_align" style:display-name="Text right 
align" style:family="paragraph" style:parent-style-name="Text_body" 
style:class="text">
      <style:paragraph-properties fo:text-align="right" fo:margin="100%" 
fo:margin-left="2cm" fo:margin-right="0cm" fo:text-indent="0cm" 
style:auto-text-indent="false"/>
    </style:style>
    <style:style style:name="Text_left_indent" style:display-name="Text left 
indent" style:family="paragraph" style:parent-style-name="Text_body" 
style:class="text">
      <style:paragraph-properties fo:margin="100%" fo:margin-left="2cm" 
fo:margin-right="0cm" fo:text-indent="-2cm" style:auto-text-indent="false"/>
    </style:style>
    <style:style style:name="Text_code" style:display-name="Text code" 
style:family="paragraph" style:parent-style-name="Text_body" style:class="text">
      <style:paragraph-properties fo:margin="100%" fo:margin-left="2cm" 
fo:margin-right="2cm" fo:text-indent="0cm" style:auto-text-indent="false"/>
      <style:text-properties fo:font-size="10pt"/>
    </style:style>
    <style:style style:name="Text_example" style:display-name="Text example" 
style:family="paragraph" style:parent-style-name="Text_body" style:class="text">
      <style:paragraph-properties fo:margin="100%" fo:margin-left="2cm" 
fo:margin-right="2cm" fo:text-indent="0cm" style:auto-text-indent="false"/>
      <style:text-properties fo:font-style="italic"/>
    </style:style>
    <style:style style:name="Text_list" style:family="paragraph" 
style:parent-style-name="Text_body" style:list-style-name="List"/>
    <text:list-style style:name="List">
      <text:list-level-style-bullet text:level="1" 
text:style-name="Bullet_symbol" text:bullet-char="–">
        <style:list-level-properties 
text:list-level-position-and-space-mode="label-alignment">
          <style:list-level-label-alignment text:label-followed-by="listtab" 
text:list-tab-stop-position="2cm" fo:text-indent="-0.635cm" 
fo:margin-left="2cm"/>
        </style:list-level-properties>
      </text:list-level-style-bullet>
    </text:list-style>
    <style:style style:name="Bullet_symbol" style:display-name="Bullet symbol" 
style:family="text"/>
    <style:style style:name="Footnote_symbol" style:display-name="Footnote 
Symbol" style:family="text"/>
    <style:style style:name="Footnote_anchor" style:display-name="Footnote 
anchor" style:family="text">
      <style:text-properties style:text-position="super 58%"/>
    </style:style>
    <style:style style:name="Footnote" style:family="paragraph" 
style:parent-style-name="Standard" style:class="extra">
      <style:paragraph-properties fo:margin="100%" fo:margin-left="0.598cm" 
fo:margin-right="0cm" fo:text-indent="-0.598cm" style:auto-text-indent="false" 
text:number-lines="false" text:line-number="0"/>
      <style:text-properties fo:font-size="10pt"/>
    </style:style>
    <text:notes-configuration text:note-class="footnote" 
text:citation-style-name="Footnote_symbol" 
text:citation-body-style-name="Footnote_anchor" style:num-format="1" 
text:start-value="0" text:footnotes-position="page" 
text:start-numbering-at="document"/>
    <text:notes-configuration text:note-class="endnote" style:num-format="i" 
text:start-value="0"/>
    <style:style style:name="R" style:family="text">
      <style:text-properties style:font-name="Linux Libertine" 
fo:font-style="normal"/>
    </style:style>
    <style:style style:name="B" style:family="text">
      <style:text-properties style:font-name="Linux Libertine" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="I" style:family="text">
      <style:text-properties style:font-name="Linux Libertine" 
fo:font-style="italic"/>
    </style:style>
    <style:style style:name="BI" style:family="text">
      <style:text-properties style:font-name="Linux Libertine" 
fo:font-weight="bold" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="A" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1&amp;c2sc=1" fo:font-style="normal"/>
    </style:style>
    <style:style style:name="BA" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1&amp;c2sc=1" fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="IA" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1&amp;c2sc=1" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="BIA" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1&amp;c2sc=1" fo:font-weight="bold" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="C" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1" fo:font-style="normal"/>
    </style:style>
    <style:style style:name="BC" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1" fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="IC" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="BIC" style:family="text">
      <style:text-properties style:font-name="Linux Libertine 
G:smcp=1&amp;lnum=1" fo:font-weight="bold" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="F" style:family="text">
      <style:text-properties fo:font-style="normal"/>
    </style:style>
    <style:style style:name="BF" style:family="text">
      <style:text-properties fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="IF" style:family="text">
      <style:text-properties fo:font-style="italic"/>
    </style:style>
    <style:style style:name="BIF" style:family="text">
      <style:text-properties fo:font-weight="bold" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="U" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sups=1" 
fo:font-style="normal"/>
    </style:style>
    <style:style style:name="BU" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sups=1" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="IU" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sups=1" 
fo:font-style="italic"/>
    </style:style>
    <style:style style:name="BIU" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sups=1" 
fo:font-weight="bold" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="L" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sinf=1" 
fo:font-style="normal"/>
    </style:style>
    <style:style style:name="BL" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sinf=1" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="IL" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sinf=1" 
fo:font-style="italic"/>
    </style:style>
    <style:style style:name="BIL" style:family="text">
      <style:text-properties style:font-name="Linux Libertine G:sinf=1" 
fo:font-weight="bold" fo:font-style="italic"/>
    </style:style>
    <style:style style:name="M" style:family="text">
      <style:text-properties style:font-name="Linux Libertine Mono" 
fo:font-style="normal"/>
    </style:style>
    <style:style style:name="BM" style:family="text">
      <style:text-properties style:font-name="Linux Libertine Mono" 
fo:font-weight="bold"/>
    </style:style>
    <style:style style:name="IM" style:family="text">
      <style:text-properties style:font-name="Linux Libertine Mono" 
fo:font-style="italic"/>
    </style:style>
    <style:style style:name="BIM" style:family="text">
      <style:text-properties style:font-name="Linux Libertine Mono" 
fo:font-weight="bold" fo:font-style="italic"/>
    </style:style>
    <text:linenumbering-configuration text:number-lines="false" 
text:offset="0.499cm" style:num-format="1" text:number-position="left" 
text:increment="5"/>
  </office:styles>
  <office:automatic-styles>
    <style:page-layout style:name="pm1">
      <style:page-layout-properties fo:page-width="21.00cm" 
fo:page-height="29.7cm" style:num-format="1" style:print-orientation="portrait" 
fo:margin="3cm" fo:margin-top="2cm" fo:margin-bottom="3cm" fo:margin-left="3cm" 
fo:margin-right="3cm" style:writing-mode="lr-tb" 
style:footnote-max-height="0cm">
        <style:footnote-sep style:width="0.018cm" 
style:distance-before-sep="0.101cm" style:distance-after-sep="0.101cm" 
style:line-style="solid" style:adjustment="left" style:rel-width="25%" 
style:color="#000000"/>
      </style:page-layout-properties>
      <style:header-style/>
      <style:footer-style/>
    </style:page-layout>
  </office:automatic-styles>
  <office:master-styles>
    <style:master-page style:name="Text_body" style:page-layout-name="pm1"/>
  </office:master-styles>
  <office:body>
    <office:text><text:sequence-decls><text:sequence-decl 
text:display-outline-level="0" text:name="Illustration"/><text:sequence-decl 
text:display-outline-level="0" text:name="Table"/><text:sequence-decl 
text:display-outline-level="0" text:name="Text"/><text:sequence-decl 
text:display-outline-level="0" text:name="Drawing"/></text:sequence-decls>



<text:p text:style-name="Text_body">Hello alls,</text:p>

<text:p text:style-name="Text_body">Resuming my little sery of articles, I am 
explaining today
how Utmac is linked to the XML world.</text:p>

<text:h text:style-name="Heading_4" text:outline-level="4">Troff and 
Xml</text:h>

<text:p text:style-name="Text_body">We all have in mind the various attempts to 
produce XML
files from a troff document: some aim to be universal, and,
dealing with the raw troff requests, can only ouptut non
semantic html with hardcoded styles, while others, dedicated
to a particular macro, fail to consider the raw troff
requests the user may need in his document (cf. the source
of ms2html, in which the author comments he is implementing
more and more raw troff requests)</text:p>

<text:p text:style-name="Text_body">XML files are nothing else but plain text 
files with
semantic informations. On the other side, a troff document
contains structured information which gets its meaning
within the context of a macro. When we think at it, we have
yet a tool which interprets a troff source within the
context of a macro to produce plain text files: nroff.
Could we use nroff to produce xml files ? I tried, and it
appears that solution works well.</text:p>

<text:p text:style-name="Text_body">The idea is simple: one only has to write a 
macro file,
which interprets all the interface macros (paragraph,
headers...), to add XML tags to the output file. For
example, here is a simple macro to produce XML paragraphs
and headings:</text:p>

<text:p text:style-name="Text_example"><text:span 
text:style-name="F">.</text:span><text:span text:style-name="F">de</text:span> 
<text:span text:style-name="F">PP</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" first, we close the previous block</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" by printing its recorded tag</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">if</text:span> <text:span text:style-name="F">d</text:span> 
xml-block <text:span text:style-name="F">\</text:span><text:span 
text:style-name="F">\*[</text:span><text:span 
text:style-name="F">xml-block</text:span><text:span 
text:style-name="F">]</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" Secondly, we define the closing tag for the 
block</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">ds</text:span> <text:span 
text:style-name="F">xml-block</text:span> <text:span 
text:style-name="F">&lt;/p&gt;</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" and last, we print the openning tag.</text:span>
&lt;p&gt;
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">.</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">de</text:span> <text:span text:style-name="F">H1</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">if</text:span> <text:span text:style-name="F">d</text:span> 
xml-block <text:span text:style-name="F">\</text:span><text:span 
text:style-name="F">\*[</text:span><text:span 
text:style-name="F">xml-block</text:span><text:span 
text:style-name="F">]</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">rm</text:span> xml-block
&lt;h1&gt;<text:span text:style-name="F">\</text:span><text:span 
text:style-name="F">\$*</text:span>&lt;/h1&gt;
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">.</text:span></text:p>

<text:p text:style-name="Text_body">Nroff has to be configured to produce a 
correct xml files:
we do not want hyphen, lines don’t need to be adjusted, and,
the page length has to be defined correctly.</text:p>

<text:p text:style-name="Text_example"><text:span 
text:style-name="F">.</text:span><text:span text:style-name="F">\" page length 
is one line</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">pl</text:span> 1v
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">ll</text:span> 75
<text:span text:style-name="F">.</text:span><text:span text:style-name="F">\" 
don’t adjust nor hyphenates</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">na</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">nh</text:span>
<text:span text:style-name="F">.</text:span><text:span text:style-name="F">\" 
Ending macro is doc:end</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">em</text:span> doc:end
<text:span text:style-name="F">.</text:span><text:span text:style-name="F">\" 
Print header</text:span>
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
<text:span text:style-name="F">.</text:span><text:span text:style-name="F">\" 
Open the root tag</text:span>
&lt;utmac&gt;
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">de</text:span> <text:span 
text:style-name="F">doc:end</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" doc:end needs some more space to output text</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">pl</text:span> <text:span 
text:style-name="F">\</text:span><text:span 
text:style-name="F">\n(</text:span><text:span 
text:style-name="F">nl</text:span>u+3v
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" close the previous block</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">if</text:span> <text:span text:style-name="F">d</text:span> 
xml-block <text:span text:style-name="F">\</text:span><text:span 
text:style-name="F">\*[</text:span><text:span 
text:style-name="F">xml-block</text:span><text:span 
text:style-name="F">]</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" Close the root tag.</text:span>
&lt;/utmac&gt;
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" set correct page length</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">pl</text:span> <text:span 
text:style-name="F">\</text:span><text:span 
text:style-name="F">\n(</text:span><text:span 
text:style-name="F">nl</text:span>u
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">.</text:span></text:p>

<text:p text:style-name="Text_body">Since the fonts are hierarchical and 
defined as strings in
Utmac, they are easy to implement as well.</text:p>

<text:p text:style-name="Text_example"><text:span 
text:style-name="F">.</text:span><text:span text:style-name="F">ds</text:span> 
<text:span text:style-name="F">font-bold0</text:span> <text:span 
text:style-name="F">&lt;/B&gt;</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">ds</text:span> <text:span 
text:style-name="F">font-bold1</text:span> <text:span 
text:style-name="F">&lt;B&gt;</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">nr</text:span> <text:span 
text:style-name="F">f-b</text:span> <text:span text:style-name="F">0</text:span>
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">ds</text:span> <text:span text:style-name="F">B</text:span> 
<text:span text:style-name="F">\ER’f-b 
1-\En[f-b]’\E*[font-bold\En[f-b]]</text:span></text:p>

<text:p text:style-name="Text_body">The only real problem of using nroff to 
produce xml
documents is that — along with troff — it is not easy to
deal with automatically inserted spaces. I tried to use
.chop and \c, but without reliable results. To solve that
problem and escape the possible restricted characters a user
may insert in his document (’&lt;’, ’&gt;’, and
’&amp;’), I wrote a small post-processor –
<text:span text:style-name="I">postxml</text:span> –, which translates a custom 
set of tags
to xml special characters. Amongst those tags, a special tag
removes newlines:</text:p>

<text:p text:style-name="Text_example">#[ becomes &lt;
#] becomes &gt;
#( becomes &amp;
#) becomes ;
<text:span text:style-name="F">\n#</text:span>-<text:span 
text:style-name="F">\n </text:span>is deleted from the stream, and is used to 
delete newlines.</text:p>

<text:p text:style-name="Text_body">So, instead of directly writing xml tags, 
the nroff macro
produces writes those custom tags, which are later
translated by postxml. Our paragraph macro becomes:</text:p>

<text:p text:style-name="Text_example"><text:span 
text:style-name="F">.</text:span><text:span text:style-name="F">de</text:span> 
<text:span text:style-name="F">PP</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">if</text:span> <text:span text:style-name="F">d</text:span> 
xml-block <text:span text:style-name="F">\{</text:span><text:span 
text:style-name="F">\</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" tag to remove unwanted newlines</text:span>
#-
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" closing xml tag</text:span>
<text:span text:style-name="F">\</text:span><text:span 
text:style-name="F">\*[</text:span><text:span 
text:style-name="F">xml-block</text:span><text:span 
text:style-name="F">]</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\}</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">ds</text:span> <text:span 
text:style-name="F">xml-block</text:span> <text:span 
text:style-name="F">#[/pp#]</text:span>
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" opening xml tag</text:span>
#[pp]
<text:span text:style-name="F">.</text:span>       <text:span 
text:style-name="F">\" tag to remove unwanted newlines</text:span>
#-
<text:span text:style-name="F">.</text:span><text:span 
text:style-name="F">.</text:span></text:p>

<text:p text:style-name="Text_body">A preprocessor, prexml, escapes the 
possible presence of
those tags in the user document. The troffxml archive,
avaible on

provides prexml, postxml, and a two xsl stylesheet to
produce html and fodt (flat open document) files, and Utmac
provides the macro ux for that purpose. So, the command to
produce xml documents from a troff source is:</text:p>

<text:p text:style-name="Text_example">prexml <text:span 
text:style-name="F">&lt;</text:span> f.tr <text:span 
text:style-name="F">|</text:span> nroff -Tlocale -mux <text:span 
text:style-name="F">|</text:span> postxml <text:span 
text:style-name="F">&gt;</text:span> f.xml
xsltproc utohtml.xsl f.xml <text:span text:style-name="F">&gt;</text:span> 
f.html
xsltproc utofodt.xsl f.xml <text:span text:style-name="F">&gt;</text:span> 
f.fodt</text:p>

<text:p text:style-name="Text_body">Since I believe you want to have a look at 
the result, you
will find, joined to this mail, its xml, html, and fodt
versions as produced by this system (which reveals the fodt
code block needs some more work...).</text:p>

<text:p text:style-name="Text_body">On my next mail about Utmac, I will present 
you some
goodies.</text:p>

<text:p text:style-name="Text_body">Kind Regards,
Pierre-Jean</text:p>
</office:text>
  </office:body>
</office:document>

Reply via email to