On 12/28/23, holger.herrl...@posteo.de <holger.herrl...@posteo.de> wrote: > echo ä | gpic | hexStream > 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53 | .if !dPS > 0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a | .ds PS. > 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45 | .if !dPE > 0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a | .ds PE. > 0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a | .lf 1 -. > 0xc3 0xa4 0x0a | ... > > echo Ä | gpic | hexStream > gpic:<standard input>:1: invalid input character code 132 > 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53 | .if !dPS > 0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a | .ds PS. > 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45 | .if !dPE > 0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a | .ds PE. > 0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a | .lf 1 -. > 0xc3 0x0a | .. > > The character emerges from a input file name. So it is missed by > preconv somewhere,
As Lennart points out, the above pipelines don't invoke preconv at all. But also the above examples don't come from a filename, so I suspect your example is too simplified from your actual use case to illustrate the problem. Do you have a command sequence that DOES invoke preconv where UTF-8 characters are not being correctly handled? > however why is 'ä' working properly/ just passed through? It's not "working properly" in a sense that groff can handle. The input above shows the ä is coming out as 0xc3 0xa4, which is the UTF-8 encoding of the character. But were this to go into a groff pipeline, it would interpret those two bytes as two Latin-1 characters, neither of which is ä. (In the example you posted at the start of this thread, where the 0xc3 0xa4 went to the terminal, your terminal interpreted that sequence as UTF-8 and displayed an ä. So it only looked "right" because your input and output encodings matched.) Your second example shows that pic is discarding the byte of Ä's encoding it doesn't recognize as valid Latin-1. You can see this in two ways: this byte is missing from your hexStream output, and pic throws an error. The only byte left, 0xc3, is a Latin-1 Ã, which how groff would interpret it. But your terminal, expecting UTF-8, would be unable to output anything meaningful for this.