On 12/4/10, Lee Rothstein <lee@ > wrote: > On 12/4/2010 10:06 AM, Corinna Vinschen wrote: > > > On Dec 4 10:05, Lee wrote: > > >> On 12/3/10, Eric Blake <eblake@ > wrote: > >>> Read the FAQ. http://www.faqs.org/faqs/unix-faq/shell/bash/, E9. > > >> Which says the en_US locale collates the upper and lower case > >> letters like this: > >> AaBb...Zz > > >> I got that much :) What I don't get is why someone would _want_ the > >> collating sequence to be AaBb... or why that sequence was picked for > >> en_US instead of using the natural order of A-Za-z. > > > It's not the "natural" order, it's an arbitrary order which has been > > chosen back in 1963 when the ASCII code has been defined. It's not used > > as "natural" order outside of computer systems and it's not even the > > natural order on some computer systems (See EBCDIC). > > > If you take a look into a hardcopy encyclopedia written in english, > > you'll be very comfortable that the words are ordered lexicographically > > instead of in ASCII coding, probably. Needless to say that ordering > > criteria for non-english languages may contain more characters in the > > sequence, in german for instance > > > "AaäBb...Ooö...Ssß...Uuü...Zz" > > > So, let's reiterate: > > > - If I need the order for the computer language, I say so: > > > LC_COLLATE=C.UTF-8 > > > - Otherwise, if I need the order for the natural language, I > > say so: > > > LC_COLLATE=en_US.UTF-8 > > LC_COLLATE=de_DE.UTF-8 > > ... > > Here's my takeaway, given Corinna's interesting and complete > context, and my intents. (My intentions, BTW, are for my scripts > to have as much generality as possible [given my limited skills > ;-|].) > > Therefore, instead of using '[A-Z]' to represent caps, I should > have used (?) the Posixly Correct, '[:upper:]'.
Close, you should have used '[[:upper:]]' $ cat t_regex #!/bin/bash # t_regex: Test test regex # By Lee Rothstein, 2010-12-03, 16:27:38 regex_test () { echo -n "[A-Z] test: " if [[ "$1" =~ [A-Z] ]] ; then echo Contains Capital Letters: $1 else echo Doesn\'t Contain Capital Letters: $1 fi echo -n "[:upper:] test: " if [[ "$1" =~ [[:upper:]] ]] ; then echo Contains Capital Letters: $1 else echo Doesn\'t Contain Capital Letters: $1 fi } unset LC_COLLATE export LANG="C.UTF-8" echo "=== LANG=$LANG" regex_test dfgh regex_test Dfgh echo echo export LANG="en_US.UTF-8" echo "=== LANG=$LANG" regex_test dfgh regex_test Dfgh ~/src $ ./t_regex === LANG=C.UTF-8 [A-Z] test: Doesn't Contain Capital Letters: dfgh [:upper:] test: Doesn't Contain Capital Letters: dfgh [A-Z] test: Contains Capital Letters: Dfgh [:upper:] test: Contains Capital Letters: Dfgh === LANG=en_US.UTF-8 [A-Z] test: Contains Capital Letters: dfgh [:upper:] test: Doesn't Contain Capital Letters: dfgh [A-Z] test: Contains Capital Letters: Dfgh [:upper:] test: Contains Capital Letters: Dfgh ~/src $ Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple