Re: Dealing with character ranges in grep

2011-06-28 Thread Aharon Robbins
> Date: Mon, 27 Jun 2011 15:10:43 +0200 > From: Paolo Bonzini > To: Aharon Robbins > CC: egg...@cs.ucla.edu, ebl...@redhat.com, bug-g...@gnu.org, > bug-gnulib@gnu.org, k...@freefriends.org > Subject: Re: Dealing with character ranges in grep > > On 06/16/2011

Re: Dealing with character ranges in grep

2011-06-27 Thread Jim Meyering
Paolo Bonzini wrote: > On 06/15/2011 09:12 PM, Jim Meyering wrote: >> However, backreferences force these tools to skip the DFA-based >> optimization and resort to running the regexp code. In that case, >> there is a dichotomy. Adding a backreference to a range-including >> regexp would have the

Re: Dealing with character ranges in grep

2011-06-27 Thread Paolo Bonzini
On 06/16/2011 09:06 PM, Aharon Robbins wrote: I have already corresponded with Chet. He plans to add a shell option to enable RRI, and we can hope that at some point, it might become the default. So that has already been started. I'd do the other way round---make it the default, and add a shell

Re: Dealing with character ranges in grep

2011-06-27 Thread Paolo Bonzini
On 06/15/2011 09:12 PM, Jim Meyering wrote: However, backreferences force these tools to skip the DFA-based optimization and resort to running the regexp code. In that case, there is a dichotomy. Adding a backreference to a range-including regexp would have the surprising consequence of changin

Re: Dealing with character ranges in grep

2011-06-27 Thread Paolo Bonzini
On 06/16/2011 10:44 AM, Jim Meyering wrote: > To make this proposed change go through, that configure-time option would > have to be eliminated, so that we always build with the gnulib-provided > regex code. Of course, if glibc ever changes, we can detect that and > automatically prefer it w

Re: Dealing with character ranges in grep

2011-06-17 Thread Jim Meyering
Johannes Meixner wrote: > Hello Jim, > > On Jun 16 10:44 Jim Meyering wrote (excerpt): Thus, if we go this route, we are effectively saying that people who want self-consistent regex-handling in our tools must build with --with-included-regex or end up causing subtle problems. >

Re: Dealing with character ranges in grep

2011-06-17 Thread Johannes Meixner
Hello Jim, On Jun 16 10:44 Jim Meyering wrote (excerpt): Thus, if we go this route, we are effectively saying that people who want self-consistent regex-handling in our tools must build with --with-included-regex or end up causing subtle problems. ... It goes like this (at least for gawk, gre

Re: Dealing with character ranges in grep

2011-06-16 Thread Aharon Robbins
Hi. > From: Jim Meyering > To: Bruno Haible > Cc: Paolo Bonzini , Aharon Robbins , > bug-gnulib@gnu.org, bug-grep , k...@freefriends.org > Subject: Re: Dealing with character ranges in grep > Date: Thu, 16 Jun 2011 07:58:05 +0200 > > To make this proposed

Re: Dealing with character ranges in grep

2011-06-16 Thread Aharon Robbins
Hi All. > Date: Wed, 15 Jun 2011 14:09:45 -0600 > From: Eric Blake > To: Paul Eggert > CC: Aharon Robbins , bonz...@gnu.org, bug-g...@gnu.org, > bug-gnulib@gnu.org, k...@freefriends.org > Subject: Re: Dealing with character ranges in grep > > > Doesn'

Re: Dealing with character ranges in grep

2011-06-16 Thread Philipp Thomas
* Jim Meyering (j...@meyering.net) [20110616 10:55]: > For the record, at least Fedora's grep and sed both build > --without-included-regex, so would be affected. SLES and openSUSE also build sed and grep --without-included-regex, so would also be affected. Philipp

Re: Dealing with character ranges in grep

2011-06-16 Thread Johannes Meixner
Hello, On Jun 16 15:51 Stanislav Brabec wrote: Johannes Meixner wrote: Again: I do not care if this or that special feature is supported or not because I think that consistent behaviour has topmost priority. Do you prefer "consistent behavior of regexp in all applications across the whole d

Re: Dealing with character ranges in grep

2011-06-16 Thread Stanislav Brabec
Johannes Meixner wrote: > Again: > I do not care if this or that special feature is supported or not > because I think that consistent behaviour has topmost priority. Do you prefer "consistent behavior of regexp in all applications across the whole distribution" or "consistent behavior of (GNU) g

Re: Dealing with character ranges in grep

2011-06-16 Thread Stanislav Brabec
Jim Meyering wrote: > Johannes Meixner wrote: > > recently I became openSUSE package maintainer for grep and gawk. > > > > I added Stanislav Brabec, openSUSE package maintainer for sed. > > > > In short: > > I support and appreciate everything which leads to consistence. > .. > > Thanks for the qu

Re: Dealing with character ranges in grep

2011-06-16 Thread Johannes Meixner
Hello, On Jun 16 13:51 Stanislav Brabec wrote (excerpt): grep in openSUSE uses glibc regex by default. Yes. Currently grep in openSUSE is built using "configure --without-included-regex" as it was built for openSUSE all the time. Perhaps there is a misunderstanding what I mean. What I mea

Re: Dealing with character ranges in grep

2011-06-16 Thread Johannes Meixner
Hello, recently I became openSUSE package maintainer for grep and gawk. I added Stanislav Brabec, openSUSE package maintainer for sed. In short: I support and appreciate everything which leads to consistence. On Jun 16 07:58 Jim Meyering wrote: Jim Meyering wrote: Bruno Haible wrote: Pao

Re: Dealing with character ranges in grep

2011-06-16 Thread Jim Meyering
Jim Meyering wrote: ... >> Thus, if we go this route, we are effectively saying >> that people who want self-consistent regex-handling >> in our tools must build with --with-included-regex or end >> up causing subtle problems. >> >> That's a big leap. >> I'm not saying I won't take upstream grep ov

Re: Dealing with character ranges in grep

2011-06-16 Thread Jim Meyering
Johannes Meixner wrote: > recently I became openSUSE package maintainer for grep and gawk. > > I added Stanislav Brabec, openSUSE package maintainer for sed. > > In short: > I support and appreciate everything which leads to consistence. ... Thanks for the quick reply and the support.

Re: Dealing with character ranges in grep

2011-06-15 Thread Jim Meyering
Jim Meyering wrote: > Bruno Haible wrote: >> Paolo, >> >>> > [=e=] to match "e" as well as accented versions like é, è and ê). >>> > That is the one feature that you get with glibc, and that you would >>> > sacrifice when building --with-included-regex. >>> >>> I agree.  It's up to distros to choo

Re: Dealing with character ranges in grep

2011-06-15 Thread Eric Blake
On 06/15/2011 12:36 PM, Paul Eggert wrote: > On 06/15/11 10:00, Aharon Robbins wrote: >> Can I get a clear "yes, grep and sed are going to change to Reasonable >> Range Interpretation"? > > I can't speak for grep and sed since I'm not a maintainer of > either, but to my mind the only thing that ma

Re: Dealing with character ranges in grep

2011-06-15 Thread Jim Meyering
Bruno Haible wrote: > Paolo, > >> > [=e=] to match "e" as well as accented versions like é, è and ê). >> > That is the one feature that you get with glibc, and that you would >> > sacrifice when building --with-included-regex. >> >> I agree.  It's up to distros to choose, of course. > > If you are

Re: Dealing with character ranges in grep

2011-06-15 Thread Paul Eggert
On 06/15/11 10:00, Aharon Robbins wrote: > Can I get a clear "yes, grep and sed are going to change to Reasonable > Range Interpretation"? I can't speak for grep and sed since I'm not a maintainer of either, but to my mind the only thing that makes sense is for regular expressions like [a-z] to ha

Re: Dealing with character ranges in grep

2011-06-15 Thread Aharon Robbins
Hi All. Can I get a clear "yes, grep and sed are going to change to Reasonable Range Interpretation"? I was looking into the code, in terms of not using RE_RANGES_IGNORE_LOCALES but simply always doing it based on character set ordering. Doing so lets up throw away hard_locale.[ch] also. Befor

Re: Dealing with character ranges in grep

2011-06-14 Thread Aharon Robbins
Hi. > From: Paolo Bonzini > Date: Tue, 14 Jun 2011 13:11:32 +0200 > Subject: Re: Dealing with character ranges in grep > To: Aharon Robbins > Cc: egg...@cs.ucla.edu, k...@freefriends.org, bug-g...@gnu.org, > bug-gnulib@gnu.org > > > ? In principle, I'm al

Re: Dealing with character ranges in grep

2011-06-14 Thread Philipp Thomas
* Karl Berry (k...@freefriends.org) [20110611 01:50]: > Because whatever changes they might or might not agree to make, they > obviously won't reach user systems for years. Not necessarily. Linux distributors do backports of changes they deem good to have now and then. Philipp

Re: Dealing with character ranges in grep

2011-06-14 Thread Paolo Bonzini
>   In principle, I'm all for this, but in practice, I'm going to leave gawk's >   code alone for now (there's always 4.1 :-). As long as --posix is not affecting the choice, that's fine. However, please make sure that compiling gawk --without-included-regex works (it should go without saying)!

Re: Dealing with character ranges in grep

2011-06-13 Thread Aharon Robbins
Hi All. > Date: Thu, 09 Jun 2011 10:14:01 -0700 > From: Paul Eggert > To: Paolo Bonzini > CC: Aharon Robbins , bug-grep , > bug-gnulib , k...@freefriends.org > Subject: Re: Dealing with character ranges in grep > > On 06/08/2011 10:14 PM, Aharon Robbins wrote: &g

Re: Dealing with character ranges in grep

2011-06-10 Thread Karl Berry
I guess I don't follow the purpose of involving glibc now. Because whatever changes they might or might not agree to make, they obviously won't reach user systems for years. So for anyone to make use of the new options, it all has to be implemented in gnulib regex anyway. If the goal is to minim

Re: Dealing with character ranges in grep

2011-06-10 Thread Jim Meyering
Bruno Haible wrote: >> With my proposal, distros/people that use --with-included-regex would >> get understandable semantics + no equivalence classes >> ... >> locale behavior of regex are irremediably >> broken. For example, when you have a collation element, you can match >> it using ranges (e.g

Re: Dealing with character ranges in grep

2011-06-10 Thread Aharon Robbins
: Re: Dealing with character ranges in grep > > On 06/08/2011 10:14 PM, Aharon Robbins wrote: > > > So, for the upcoming gawk 4.0, I decided (as Karl put it) to cut the > > Gordian knot and make ranges behave like the C locale, the way it's long > > been documented, an

Re: Dealing with character ranges in grep

2011-06-09 Thread Paolo Bonzini
On Thu, Jun 9, 2011 at 19:14, Paul Eggert wrote: > On 06/08/2011 10:14 PM, Aharon Robbins wrote: > >> So, for the upcoming gawk 4.0, I decided (as Karl put it) to cut the >> Gordian knot and make ranges behave like the C locale, the way it's long >> been documented, and as most people expect.  Tho

Re: Dealing with character ranges in grep

2011-06-09 Thread Paul Eggert
On 06/08/2011 10:14 PM, Aharon Robbins wrote: > So, for the upcoming gawk 4.0, I decided (as Karl put it) to cut the > Gordian knot and make ranges behave like the C locale, the way it's long > been documented, and as most people expect. Those who want the POSIX > behavior can still get it using

Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep]

2011-06-09 Thread Paolo Bonzini
On 06/09/2011 01:53 PM, Bruno Haible wrote: Paolo, My proposal wouldn't change defaults, which is why I believe that this is a separate topic. But at the same time you are pushing for the use of --with-included-regex. We found out that by doing this, the equivalence classes feature gets lost,

Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep]

2011-06-09 Thread Bruno Haible
Paolo, > My proposal wouldn't change defaults, which is why I believe that this > is a separate topic. But at the same time you are pushing for the use of --with-included-regex. We found out that by doing this, the equivalence classes feature gets lost, and the divergence between glibc and gnuli

implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep]

2011-06-09 Thread Paolo Bonzini
On 06/09/2011 01:12 PM, Bruno Haible wrote: What would it take to let distros/people use --with-included-regex and get understandable semantics for ranges + working equivalence classes? I would prefer that to your proposal, because it cannot be seen as a regression by people who care about equiv

Re: Dealing with character ranges in grep

2011-06-09 Thread Bruno Haible
Paolo, > With my proposal, distros/people that use --with-included-regex would > get understandable semantics + no equivalence classes > ... > locale behavior of regex are irremediably > broken. For example, when you have a collation element, you can match > it using ranges (e.g. [d-i] matches

Re: Dealing with character ranges in grep

2011-06-09 Thread Paolo Bonzini
On 06/09/2011 11:58 AM, Bruno Haible wrote: Paolo, [=e=] to match "e" as well as accented versions like é, è and ê). That is the one feature that you get with glibc, and that you would sacrifice when building --with-included-regex. I agree. It's up to distros to choose, of course. If you a

Re: Dealing with character ranges in grep

2011-06-09 Thread Bruno Haible
Paolo, > > [=e=] to match "e" as well as accented versions like é, è and ê). > > That is the one feature that you get with glibc, and that you would > > sacrifice when building --with-included-regex. > > I agree.  It's up to distros to choose, of course. If you are on the point of sacrificing a

Re: Dealing with character ranges in grep

2011-06-09 Thread Paolo Bonzini
On 06/09/2011 11:33 AM, Jim Meyering wrote: I like the idea. However a potential sticking point is the equivalence class (e.g., using [=e=] to match "e" as well as accented versions like é, è and ê). That is the one feature that you get with glibc, and that you would sacrifice when building --wit

Re: Dealing with character ranges in grep

2011-06-09 Thread Jim Meyering
Paolo Bonzini wrote: > [making this public, there should be no reason not to] > > On 06/08/2011 10:14 PM, Aharon Robbins wrote: >> Hi. As we've discussed a little previously, I finally got tired of >> trying to explain to users why the character range [a-z] was matching >> most uppercase letters a

Re: Dealing with character ranges in grep

2011-06-09 Thread Paolo Bonzini
[making this public, there should be no reason not to] On 06/08/2011 10:14 PM, Aharon Robbins wrote: Hi. As we've discussed a little previously, I finally got tired of trying to explain to users why the character range [a-z] was matching most uppercase letters also. ("I've found a bug in gawk!