https://sourceware.org/bugzilla/show_bug.cgi?id=27551
--- Comment #15 from Vincent Lefèvre <vincent-srcware at vinc17 dot net> --- (In reply to Nick Clifton from comment #14) > But that is the point. The encoding of characters in the file being scanned > is not known. Using LC_CTYPE is incorrect, because that specifies how to > display characters, not to read them. This is not what POSIX says. Read again: LC_CTYPE Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files) and to identify printable strings. It says "interpretation of sequences of bytes of text data as characters". Thus that's precisely for reading (in addition to displaying). Note that nowadays, UTF-8 is commonly used, so that's very useful. And if a 8-bit byte sequence matches a valid UTF-8 sequence, it is probably a real character. In practice, false positives for UTF-8 are much rarer than false positives for ASCII (i.e. sequences of 7-bit bytes that actually do not correspond to text). A user who wishes to stick with ASCII could still set LC_CTYPE to C. -- You are receiving this mail because: You are on the CC list for the bug.