On 1/8/24, hoh...@posteo.de <hoh...@posteo.de> wrote: > On Tue, 2 Jan 2024 11:04:25 -0600 > Dave Kemper <saint.s...@gmail.com> wrote: > >> > ECMA-48 says for 0x84: >> >> Also irrelevant to groff, as it doesn't use ECMA-48. Groff tools >> (including gpic) take input in Latin-1, period. > > I don't think so. ECMA-48 may be interpreted by terminals.
In the message to which I was replying, you were speaking of the sequence of bytes that were part of the input to gpic; in this realm, ECMA-48 is irrelevant. And in any case, the 0x84 byte in question is part of the UTF-8 encoding of Unicode character U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS; if it's being interpreted by a terminal somewhere as ECMA-48, something is going wrong. What seems to be going wrong in this instance is that you're passing UTF-8 directly to gpic without first running it through preconv or iconv, resulting in a byte sequence gpic doesn't recognize. You haven't said whether you've tried converting the input before sending it to gpic, or why you're avoiding preconv. > In the case of terminal output, those characters if interpreted as > control sequences would thrown the output into disarray. Therefore, > if I'm right, it's rejected as invalid but not passed through. Correct, gpic won't pass through bytes it considers invalid. $ echo Ä | od -t x1 0000000 c3 84 0a 0000003 $ echo Ä | pic | grep -av '^\.' | od -t x1 pic:<standard input>:1: invalid input character code 132 0000000 c3 0a 0000002 gpic strips the 0x84 (decimal 132) byte, leaving you with invalid UTF-8, or valid but erroneous Latin-1.