[Bug binutils/27551] The default encoding of the strings utility does not conform to POSIX: should honor the current locale.

nickc at redhat dot com Fri, 09 Apr 2021 08:31:23 -0700

https://sourceware.org/bugzilla/show_bug.cgi?id=27551


--- Comment #12 from Nick Clifton <nickc at redhat dot com> ---
(In reply to Vincent Lefèvre from comment #10)
Hi Vincent,

> The bug is that:
> 
>   if (encoding == 's')
>     buf[0] = c & 0x7f;
> 
> So the byte 0xc0 gets changed to 0x40, which is printable.

No - this is the correct behaviour.  The 's' encoding says that the characters
in the file being examined are 7-bits long, not 8-bits.  Hence when a byte is
read only the bottom 7 bits should be considered when deciding if the character
is printable.  

Now it could also be argued that for 's' encoding, if the character is going to
be displayed, then it should be truncated before being printed.  But this whole
PR is about the discrepancy between reading characters and displaying
characters, and I felt that displaying the byte intact was the right thing to
do.  But I could be persuaded otherwise.


> % printf "\300\300\300\300" | ./strings | iconv
> iconv: illegal input sequence at position 0

But if we use your original test case and the patched strings:

  % printf "abcdéfghi" | ./strings | iconv 
  abcdiconv: illegal input sequence at position 4

  % echo $LC_CTYPE
  C.UTF-8

So now I am very confused.


> But even when removing this test, keeping the "else", c is always 1-byte 
> long, > so that non-ASCII characters are always regarded as non-printable.

Are you saying that the length parameter passed to mbtowc() should include the
first NUL byte ?  That was not how I read the manual page description for the
function, but I can give it a go if you think that it will help.

Cheers
  Nick

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug binutils/27551] The default encoding of the strings utility does not conform to POSIX: should honor the current locale.

Reply via email to