Re: [Groff] mom : unicode in .INCLUDE'd files

Ralph Corderoy Fri, 21 Jul 2017 03:30:39 -0700

Hi Erich,

> When I enter unicode, like:
>
>                          ÄÖÜ SS ÒÓÔÕŎŌ Ç äöü ß òóôõŏō ç
>
> ...and process them with pdfmom, they show up perfectly.  But if I
> include the same characters in a file with the .INCLUDE macro, they
> disappear.


Those are Unicode codepoints, but what encoding are you using to
represent them in a file as bytes?  Is it UTF-8?  Only `Ŏ', U+014E,
isn't in ISO 8859-1, AKA Latin1.

> Processed with -P-bcu -Tutf8, they show up like wrong encoded strings.

troff(1) reads files of ISO 8859-1.  It sounds like, in this particular
test, you're giving it bytes of UTF-8 that it's trying to interpret as
ISO-8859-1.

U+00A3 is a `£'.  In UTF-8, it's two bytes;  the 0a is the linefeed.

    $ hd <<<£
    00000000  c2 a3 0a                                          |...|

iso-8859-1(7) shows c2 is `Â' and a3 is `£' and that's how groff
interprets these bytes.

    $ groff -Tutf8 <<<£ | grep .
    Â£

> I tried, in vain, the following pipe:
>
>     soelim example.mom | preconv -eutf8 |
>     groff -mom -Tutf8 -P-bcu  > example.txt

As Denis said, soelim(1) looks for `.so' lines.  `.INCLUDE' means
nothing to it.
http://git.savannah.gnu.org/cgit/groff.git/tree/src/preproc/soelim/soelim.cpp#n169
You could try replacing `.INCLUDE' with `.so'.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

Re: [Groff] mom : unicode in .INCLUDE'd files

Reply via email to