On 04/04/2011 04:09 PM, Jim Garrison wrote: > I'm getting weird behavior from grep. Searching for a bracketed range of > characters (i.e. [A-Z]) is doing case-insensitive matching, while an > identical but explicit character set match (i.e. [ABCDE...Z]) does not.
Your problem is not with grep, but with your LC_COLLATE settings (which inherit from LC_ALL). POSIX states that range expressions (such as [A-Z]) are undefined in any locale except C; and some locales (like en_US.UTF-8) happen to treat A-B as AaB, A-b as AaBb, and so forth (that is, they collate case-insensitively). > > $ grep '[a-b]' test.dat > abcde > ABCDE So, in a case-insensitive collation, this range expression includes at least one of A or B (but probably not both); and since that matches the ABCDE line, you get a correct result for the collation locale you requested. > > Contrast with the correctly-working examples below > > $ grep '[ab]' test.dat > abcde Here, there's no range, so there's no ambiguity. Also, try "LC_ALL=C grep '[a-b]' test.dat" to see a difference. -- Eric Blake ebl...@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature