Paolo, > My proposal wouldn't change defaults, which is why I believe that this > is a separate topic.
But at the same time you are pushing for the use of --with-included-regex. We found out that by doing this, the equivalence classes feature gets lost, and the divergence between glibc and gnulib becomes greater. > 1) Aharon would like to release gawk 4.0 in the very near future, and 2) > adding an extension to glibc takes time. That's why I prefer to work in > smaller steps. If there is time pressure for gawk 4.0, gawk can itself make modifications to the regex from gnulib, through gnulib-tool's option --local-dir. See <http://www.gnu.org/software/gnulib/manual/html_node/Extending-Gnulib.html>. It can also make --with-included-regex the default on its own. > We'd need glibc to export two functions in both multi-byte and > wide-character versions: > > 1) streqcoll(S1, S2) and wcseqcoll(S1, S2) would be the same as strcoll > and wcscoll, but they would compare only according to primary weights. > A slightly more formal definition is that streqcoll(S1, S2) == 0 iff S1 > matches the \`[=C1=][=C2=][=C3=]...[=Cn=]\' regular expression, where Ci > are the characters of S2 (I'd need to double check this against POSIX > though). When non-zero, the result of streqcoll(S1, S2) would be the > same as strcoll(S1, S2). Likewise, glibc could provide streqxfrm and > wcseqxfrm, with the definition that strcmp(streqxfrm(S1), streqxfrm(S2)) > == streqcoll(S1, S2). > > 2) On top of this, [.ss.] could be implemented using an additional > function mbelemlen(S) giving the length of the first collation element > in S. [.S1.] would be rejected unless mbelemlen(S1) == strlen(S1), and > [.S1.] would match S2 if strcoll(S1, strndup(S2, mbelemlen(S2))) == 0. > wcelemlen could be provided likewise. > > These are the minimal extensions that would be required to support full > regular expression features portably and in a manner that is compatible > with glibc, except for ranges Great! These look like a good basis for discussing with the glibc people. Ad 1): Is streqcoll symmetric? That is, is streqcoll(S1, S2) the same as streqcoll(S2, S1)? It is not immediately clear to me from the definition. If not, then a single streqxfrm function is not sufficient, you need two functions streqxfrm1 and streqxfrm2, such that streqcoll(S1, S2) == strcmp(streqxfrm1(S1), streqxfrm2(S2)). Ad 2): Do you need 2 functions, one for char * strings, and one for wide strings here as well? Bruno -- In memoriam Johanna Kirchner <http://en.wikipedia.org/wiki/Johanna_Kirchner>