Hi All. This note is really encouraging - I was unaware of this change in the latest standard. I will be revising gawk and its documentation to reflect this, including the links Paul has supplied.
I have not been following the details of the rest of the discussion; esp. as things have been arriving in my inbox out of order (not sure why that is!). I will try to review and reply. Thanks, Arnold > Date: Thu, 09 Jun 2011 10:14:01 -0700 > From: Paul Eggert <egg...@cs.ucla.edu> > To: Paolo Bonzini <bonz...@gnu.org> > CC: Aharon Robbins <arn...@skeeve.com>, bug-grep <bug-g...@gnu.org>, > bug-gnulib <bug-gnulib@gnu.org>, k...@freefriends.org > Subject: Re: Dealing with character ranges in grep > > On 06/08/2011 10:14 PM, Aharon Robbins wrote: > > > So, for the upcoming gawk 4.0, I decided (as Karl put it) to cut the > > Gordian knot and make ranges behave like the C locale, the way it's long > > been documented, and as most people expect. Those who want the POSIX > > behavior can still get it using --posix. > > This comment and the ensuing thread seems to be assuming old POSIX. > In new POSIX, that is, in POSIX 1003.1-2008, the standardization committee > removed the old, bogus requirement of using collating element order. > The new rule is that the regular expression [a-z] has an unspecified > behavior outside the C (or POSIX) locale. So the new gawk behavior > will conform to POSIX, even without the --posix option. > > I suggest that gawk's behavior for [a-z] be the same regardless of whether > --posix is specified, and that this behavior be what users expect > (namely, the ASCII character range). This will be simpler. > > Similarly for grep, glibc, etc. > > For the POSIX 1003.1-2008 rule, see rule 7 of: > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 > > and for the reasoning behind the rule change, see: > > http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05