Re: uppercase german umlaut

Dave Kemper Mon, 08 Jan 2024 23:14:36 -0800

On 1/8/24, hoh...@posteo.de <hoh...@posteo.de> wrote:
> On Tue, 2 Jan 2024 11:04:25 -0600
> Dave Kemper <saint.s...@gmail.com> wrote:
>
>> > ECMA-48 says for 0x84:
>>
>> Also irrelevant to groff, as it doesn't use ECMA-48.  Groff tools
>> (including gpic) take input in Latin-1, period.
>
> I don't think so. ECMA-48 may be interpreted by terminals.


In the message to which I was replying, you were speaking of the
sequence of bytes that were part of the input to gpic; in this realm,
ECMA-48 is irrelevant.  And in any case, the 0x84 byte in question is
part of the UTF-8 encoding of Unicode character U+00C4 LATIN CAPITAL
LETTER A WITH DIAERESIS; if it's being interpreted by a terminal
somewhere as ECMA-48, something is going wrong.

What seems to be going wrong in this instance is that you're passing
UTF-8 directly to gpic without first running it through preconv or
iconv, resulting in a byte sequence gpic doesn't recognize.  You
haven't said whether you've tried converting the input before sending
it to gpic, or why you're avoiding preconv.

> In the case of terminal output, those characters if interpreted as
> control sequences would thrown the output into disarray. Therefore,
> if I'm right, it's rejected as invalid but not passed through.

Correct, gpic won't pass through bytes it considers invalid.

$ echo Ä | od -t x1
0000000 c3 84 0a
0000003
$ echo Ä | pic | grep -av '^\.' | od -t x1
pic:<standard input>:1: invalid input character code 132
0000000 c3 0a
0000002

gpic strips the 0x84 (decimal 132) byte, leaving you with invalid
UTF-8, or valid but erroneous Latin-1.

Re: uppercase german umlaut

Reply via email to