Re: documentation bug re character range expressions

2011-06-13 Thread Andre Majorel
On 2011-06-09 12:40 -0700, Marcel (Felix) Giannelia wrote: > Guess it's time I really learned how to navigate texinfo... You can spare yourself the pain with something along the lines of #!/bin/sh info --subnodes -o- "$1" | less (Which won't help you in this particular case as neither bash nor

Re: documentation bug re character range expressions

2011-06-09 Thread Marcel (Felix) Giannelia
On 09/06/11 11:31, Chet Ramey wrote: [...] No, it doesn't. It's not part of any standard, and it's not part of pattern matching, so I implemented it with the traditional C semantics because that seemed the most straightforward. Pity the implementor of character range expressions didn't have th

Re: documentation bug re character range expressions

2011-06-09 Thread Chet Ramey
On 6/8/11 5:45 PM, Marcel (Felix) Giannelia wrote: > On 07/06/11 13:45, Chet Ramey wrote: >> [...] >> I'm not going to add much to this discussion except to note that I believe >> `sorts' is correct. Consider the following script: >> >> unset LANG LC_ALL LC_COLLATE >> >> export LC_COLLATE=de_DE.UT

Re: documentation bug re character range expressions

2011-06-08 Thread Marcel (Felix) Giannelia
On 07/06/11 13:45, Chet Ramey wrote: [...] I'm not going to add much to this discussion except to note that I believe `sorts' is correct. Consider the following script: unset LANG LC_ALL LC_COLLATE export LC_COLLATE=de_DE.UTF-8 printf "%s\n" {A..Z} {a..z} | sort | tr $'\n' ' ' echo That's real

Re: documentation bug re character range expressions

2011-06-07 Thread Chet Ramey
On 6/2/11 9:12 PM, Marcel (Felix) Giannelia wrote: > Hello, > > I realize the issue of character range expressions not working as expected > (because of locale settings) has been done to death, but I thought I should > point this out. > > The bash man page says: > > "A pair of characters separat

Re: documentation bug re character range expressions

2011-06-03 Thread Eric Blake
On 06/03/2011 11:36 AM, Marcel (Felix) Giannelia wrote: > It sounds to me like what you're saying is, the *only* uses of bracket > range expressions guaranteed to be "portable" are things like [[:upper:]] > and [[:lower:]]. But I put "portable" in quotation marks just then, > because to my mind the

Re: documentation bug re character range expressions

2011-06-03 Thread Marcel (Felix) Giannelia
On Fri, June 3, 2011 10:03, Greg Wooledge wrote: > On Fri, Jun 03, 2011 at 09:12:07AM -0700, Marcel (Felix) Giannelia wrote: > > [...] > > In HP-UX's en_US.iso88591 locale, the characters are in a COMPLETELY > different order. You can't easily figure out what that order is, because > it's not docu

Re: documentation bug re character range expressions

2011-06-03 Thread Greg Wooledge
On Fri, Jun 03, 2011 at 09:12:07AM -0700, Marcel (Felix) Giannelia wrote: > And yours looks broken -- how does > echo Hello World | tr A-Z a-z > result in a bunch of non-ASCII characters? I explain it in a bit on http://mywiki.wooledge.org/locale In a bit more depth: in ASCII, the characters A-Z

Re: documentation bug re character range expressions

2011-06-03 Thread Greg Wooledge
On Fri, Jun 03, 2011 at 09:15:55AM -0700, Marcel (Felix) Giannelia wrote: > Alright -- assuming that for the moment, how does one specify > [ABCDEFGHIJKL] using [[:upper:]]? This is something that I haven't seen > documented, and I'm genuinely curious. You can't. Either write out [ABCDEFGHIJKL]

Re: documentation bug re character range expressions

2011-06-03 Thread Eric Blake
On 06/03/2011 10:15 AM, Marcel (Felix) Giannelia wrote: > Alright -- assuming that for the moment, how does one specify > [ABCDEFGHIJKL] using [[:upper:]]? This is something that I haven't seen > documented, and I'm genuinely curious. [ABCDEFGHIJKL] If you ever want a subset of [[:upper:]], the _

Re: documentation bug re character range expressions

2011-06-03 Thread Marcel (Felix) Giannelia
On 2011-06-03 05:00, Greg Wooledge wrote: On Fri, Jun 03, 2011 at 12:06:32AM -0700, Marcel (Felix) Giannelia wrote: Is it really a programmer mistake, though, to assume that [A-Z] is only capital letters? Yes, it is. You should be using [[:upper:]], or you should be setting LC_COLLATE=C if yo

Re: documentation bug re character range expressions

2011-06-03 Thread Marcel (Felix) Giannelia
On 2011-06-03 05:09, Greg Wooledge wrote: Oh, look, there's more! [...] See? Both tr(1) and ls(1) do it too! Right; forgot about ls (because "alias ls='LC_COLLATE=C ls'" has been in my .bashrc for so long that I completely forgot it was there :) ), and didn't think to try tr -- but tr appe

Re: documentation bug re character range expressions

2011-06-03 Thread Greg Wooledge
Oh, look, there's more! On Fri, Jun 03, 2011 at 12:06:32AM -0700, Marcel (Felix) Giannelia wrote: > [[:alpha:]] is too difficult to type to make it useful for the kind of > quick pattern-matching that character ranges are used for on the > interactive shell. Try it. Open-bracket, colon is an awk

Re: documentation bug re character range expressions

2011-06-03 Thread Greg Wooledge
On Fri, Jun 03, 2011 at 12:06:32AM -0700, Marcel (Felix) Giannelia wrote: > Is it really a programmer mistake, though, to assume that [A-Z] is only > capital letters? Yes, it is. You should be using [[:upper:]], or you should be setting LC_COLLATE=C if you insist on using [A-Z].

Re: documentation bug re character range expressions

2011-06-03 Thread Aharon Robbins
This is a thorny issue that plagues all POSIX-compliant utilities, not just Bash. (POSIX locales are just a blight.) For gawk 4.0, I have said "to heck with it" and changed gawk so that ranges act like they are in the C locale (unless --posix is used). I and some other people are campaigning to

Re: documentation bug re character range expressions

2011-06-03 Thread Marcel (Felix) Giannelia
Is it really a programmer mistake, though, to assume that [A-Z] is only capital letters? A through Z are a contiguous range in every representation system except EBCDIC, and it is even contiguous the modern unicode. In the world of programming characters are numbers, and programmers know this

Re: documentation bug re character range expressions

2011-06-02 Thread Jan Schampera
Hi, just as side note, not meant to touch the maintainer discussion. This is not only a "Bash problem". The programmer/user mistake to use [A-Z] for "only capital letters, capital A to capital Z" is a very common one. But I'm not sure if every official application-level documentation shoul

documentation bug re character range expressions

2011-06-02 Thread Marcel (Felix) Giannelia
Hello, I realize the issue of character range expressions not working as expected (because of locale settings) has been done to death, but I thought I should point this out. The bash man page says: "A pair of characters separated by a hyphen denotes a range expression; any character that **