On 06/09/2011 01:12 PM, Bruno Haible wrote:
What would it take to let distros/people use --with-included-regex and
get understandable semantics for ranges + working equivalence classes?
I would prefer that to your proposal, because it cannot be seen as a
regression by people who care about equivalence classes.
My proposal wouldn't change defaults, which is why I believe that this
is a separate topic. You quoted
With my proposal, distros/people that use --with-included-regex would
get understandable semantics + no equivalence classes
but snipped this part:
Right now, some distros/people use --with-included-regex and get
broken semantics + no equivalence classes
So, I agree that understandable semantics for ranges + working
equivalence classes would be the best, and if gnulib could provide that,
I would champion making --with-included-regex the default. However, 1)
Aharon would like to release gawk 4.0 in the very near future, and 2)
adding an extension to glibc takes time. That's why I prefer to work in
smaller steps.
Can that be done through gnulib code? If not, what do we need from glibc
to get it done in gnulib?
We'd need glibc to export two functions in both multi-byte and
wide-character versions:
1) streqcoll(S1, S2) and wcseqcoll(S1, S2) would be the same as strcoll
and wcscoll, but they would compare only according to primary weights.
A slightly more formal definition is that streqcoll(S1, S2) == 0 iff S1
matches the \`[=C1=][=C2=][=C3=]...[=Cn=]\' regular expression, where Ci
are the characters of S2 (I'd need to double check this against POSIX
though). When non-zero, the result of streqcoll(S1, S2) would be the
same as strcoll(S1, S2). Likewise, glibc could provide streqxfrm and
wcseqxfrm, with the definition that strcmp(streqxfrm(S1), streqxfrm(S2))
== streqcoll(S1, S2).
2) On top of this, [.ss.] could be implemented using an additional
function mbelemlen(S) giving the length of the first collation element
in S. [.S1.] would be rejected unless mbelemlen(S1) == strlen(S1), and
[.S1.] would match S2 if strcoll(S1, strndup(S2, mbelemlen(S2))) == 0.
wcelemlen could be provided likewise.
These are the minimal extensions that would be required to support full
regular expression features portably and in a manner that is compatible
with glibc, except for ranges (which we don't care about, do we?).
Paolo