On 05/14/2015 10:32 AM, Vince Rice wrote: > locale run from a cmd.exe session says that everything is “C.UTF-8”, while > locale run from mintty says that everything is en_US.UTF-8. A “which” in both > cases shows that the locale being run is cygwin’s, so I assume mintty does > something slightly differently than the normal console? I don’t even know if > there’s a difference. (Have I mentioned I don’t know anything about all of > this?) > > From cmd.exe: > LANG= > LC_CTYPE="C.UTF-8" > LC_NUMERIC="C.UTF-8" > LC_TIME="C.UTF-8" > LC_COLLATE="C.UTF-8" > LC_MONETARY="C.UTF-8" > LC_MESSAGES="C.UTF-8" > LC_ALL=
That's because all programs default to C unless told otherwise; from cmd, there is nothing stating otherwise, as each cygwin command is the first process in its own tree of processes. > > From mintty > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_ALL= mintty is a cygwin process, AND it sets your locale variables to match your Windows locale, then all other processes are children of mintty and get the preferred locale settings by default. Of course, if you don't like mintty's defaults, you can set up your shell initialization scripts to change it to your preference. > > Now, pardon my continued ignorance, but which of those variables needs to be > set to UTF16 in order for grep to work? And I assume it (they?) should be set > to en_US.UTF-16? None. UTF16 is not a valid locale. It is a valid encoding (wide character), but locales must operate on multi-byte sequences, not wide characters. So you HAVE to convert from wide character to multi-byte before you can do anything that requires a locale to work correctly. > > Thanks to everyone for your help. I think you’ve all confirmed this isn’t > cygwin-specific, but I couldn’t find anything even searching generically > (“grep unicode” and now “grep utf16”). I did finally find an external > reference to iconv, but if grep is supposed to be handle this natively, I > haven’t been able to find much on how to do it. grep cannot handle UTF16 natively. iconv exists to do encoding transformations, so that the rest of the system can live in multi-byte world instead of worrying about wide-character encodings. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature