On 6/27/13 4:48 AM, Paolo Bonzini wrote: > Il 27/06/2013 09:33, Aharon Robbins ha scritto: >> Hi Paolo. >> >>>> I still believe that there is no place other than the glibc locale >>>> descriptions where this can be fixed. >> This is necessary but not sufficient. All of gawk, grep, sed and bash >> run on lots of non-GLIBC systems. > > On non-glibc systems they use gnulib's regex implementation, so they're > fine.
You presume much. Bash, for instance, doesn't use a regex implementation, especially not gnulib's. gnulib code is, in practice, difficult to use on an individual module basis, and doesn't provide enough of a benefit to go through the effort of breaking it out of gnulib and putting it into bash. > >> The locale definitions, even for >> the same locale, vary wildly out in the wild. Therefore there's no >> other practical choice but to fix each program to provide Rational >> Range Interpretation. >> >> Fortunately, gawk and grep are already there, and I think the sed in >> the git repo is as well. Once Bash turns this on as default, the >> world will definitely be a better place, independent of GLIBC. > > I already explained this multiple times how this is completely delusional. A little bit strong, no? If you use your own matching code, it's a small matter to change strcoll to strcmp. > 1) grep, sed, coreutils and so on will only use representation-based > range interpretation (I prefer this more neutral term that also explains > what's going on) if you use gnulib's regex implementation. And by > default, they use glibc (I just checked grep). > > 2) Even if you switched the default, you would be at the mercy of > distros. Distros prefer to avoid glibc replacements in single packages, > because then all bugs have to be fixed in many different places. In > fact, I checked grep and Fedora builds it with --without-included-regex. There are systems of interest besides Linux and its distros. > Not to mention how this is entirely Latin-centric. There are some > encodings in which there is absolutely no relation between the encoding > and the expected collation order. And there's no portable way to obtain this information in any case, glibc or not. So if this is to be `fixed' only either by changing every locale definition everywhere or changing the matching code, I vote for changing the matching code. We just have to agree on an interpretation and make sure the various matchers agree. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/