https://sourceware.org/bugzilla/show_bug.cgi?id=27551
--- Comment #12 from Nick Clifton <nickc at redhat dot com> --- (In reply to Vincent Lefèvre from comment #10) Hi Vincent, > The bug is that: > > if (encoding == 's') > buf[0] = c & 0x7f; > > So the byte 0xc0 gets changed to 0x40, which is printable. No - this is the correct behaviour. The 's' encoding says that the characters in the file being examined are 7-bits long, not 8-bits. Hence when a byte is read only the bottom 7 bits should be considered when deciding if the character is printable. Now it could also be argued that for 's' encoding, if the character is going to be displayed, then it should be truncated before being printed. But this whole PR is about the discrepancy between reading characters and displaying characters, and I felt that displaying the byte intact was the right thing to do. But I could be persuaded otherwise. > % printf "\300\300\300\300" | ./strings | iconv > iconv: illegal input sequence at position 0 But if we use your original test case and the patched strings: % printf "abcdéfghi" | ./strings | iconv abcdiconv: illegal input sequence at position 4 % echo $LC_CTYPE C.UTF-8 So now I am very confused. > But even when removing this test, keeping the "else", c is always 1-byte > long, > so that non-ASCII characters are always regarded as non-printable. Are you saying that the length parameter passed to mbtowc() should include the first NUL byte ? That was not how I read the manual page description for the function, but I can give it a go if you think that it will help. Cheers Nick -- You are receiving this mail because: You are on the CC list for the bug.