On Sunday, 9 August 2020 05:58:15 BST T. Kurt Bond wrote: > Anyway, in the output file (attached to this e-mail) the unicode > characters show up fine in the body text fine, but in the PDF Outline the > characters show up as [uXXXX] text instead of the actual character. Does > anybody know why this is? I know that if I do something similar for > Heirloom troff the PDF Outline *does* contain the Unicode characters.
In the PDF Reference text strings are defined as:- ============================================================================= 3.8.1 Text Strings Certain strings contain information that is intended to be human-readable, such as text annotations, bookmark names, article names, document information, and so forth. Such strings are referred to as text strings. Text strings are encoded in either PDFDocEncoding or Unicode character encoding. PDFDocEncoding is a superset of the ISO Latin 1 encoding and is documented in Appendix D. Unicode is described in the Unicode Standard by the Unicode Consortium (see the Bibli- ography). For text strings encoded in Unicode, the first two bytes must be 254 followed by 255, representing the Unicode byte order marker, U+FEFF . (This sequence con- flicts with the PDFDocEncoding character sequence thorn ydieresis, which is un- likely to be a meaningful beginning of a word or phrase.) The remainder of the string consists of Unicode character codes, according to the UTF-16 encoding specified in the Unicode standard, version 2.0. Commonly used Unicode values are represented as 2 bytes per character, with the high-order byte appearing first in the string. ============================================================================== Since groff works internally with ascii, the \[uXXXX] form of input is converted to a separate node which is a named glyph in the appropriate font. In the groff_out format this can be seen as "Cu2640", for example, which tells the output driver to look for the named glyph in a particular font. This is only true for text which is destined for the output stream, parameters to .pdfhref are just treated as ascii, i.e PDFDocEncoding. Cheers Deri