[Bug binutils/27551] The default encoding of the strings utility does not conform to POSIX: should honor the current locale.

vincent-srcware at vinc17 dot net Wed, 14 Apr 2021 07:35:47 -0700

https://sourceware.org/bugzilla/show_bug.cgi?id=27551


--- Comment #15 from Vincent Lefèvre <vincent-srcware at vinc17 dot net> ---
(In reply to Nick Clifton from comment #14)
> But that is the point.  The encoding of characters in the file being scanned
> is not known.  Using LC_CTYPE is incorrect, because that specifies how to
> display characters, not to read them.

This is not what POSIX says. Read again:

LC_CTYPE
    Determine the locale for the interpretation of sequences of bytes of text
data as characters (for example, single-byte as opposed to multi-byte
characters in arguments and input files) and to identify printable strings.

It says "interpretation of sequences of bytes of text data as characters". Thus
that's precisely for reading (in addition to displaying).

Note that nowadays, UTF-8 is commonly used, so that's very useful. And if a
8-bit byte sequence matches a valid UTF-8 sequence, it is probably a real
character. In practice, false positives for UTF-8 are much rarer than false
positives for ASCII (i.e. sequences of 7-bit bytes that actually do not
correspond to text).

A user who wishes to stick with ASCII could still set LC_CTYPE to C.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug binutils/27551] The default encoding of the strings utility does not conform to POSIX: should honor the current locale.

Reply via email to