nocaseglob doesn't always work as expected.

2010-10-04 Thread Alexey Vinogradov
It seems that ranged filename substitution sometimes doesn't work as
expected.
A couple of illustrations.

Source dir:

ale...@ubuntu64:/tmp$ ls -la
...
-rw-r--r--  1 root root   66 2010-09-21 08:54 ahello.txt
drwxr-xr-x  2 root root6 2010-09-18 13:09 bigstore
-rw---  1 alexey   alexey1611084 2010-09-29 18:19 clipboardcache
-rw---  1 alexey   alexey1611084 2010-09-29 18:19 clipboardcache-1
-rw---  1 alexey   alexey1611084 2010-09-29 18:19 clipboardcache-2
-rw---  1 alexey   alexey1611624 2010-09-29 18:19 clipboardcache-3
-rw---  1 alexey   alexey  1 2010-10-04 18:31 cLnlY6
-rw---  1 alexey   alexey  6 2010-10-04 18:31
Code::Blocks-alexey
drwx--  2 alexey   alexey 27 2010-09-03 01:17
codeblocks_dbgrpt-30651-20100903T011749
-rw---  1 alexey   alexey  0 2010-09-13 15:29 codeblocksIJ8lud
srwx--  1 alexey   alexey  0 2010-10-04 18:31 CODEBLOCKS.socket
-rw---  1 alexey   alexey  0 2010-09-03 01:17 codeblocksWrgEbU
drwxr-xr-x  3 alexey   alexey 16 2010-08-29 18:02 context
drwxr-xr-x  3 alexey   alexey   4096 2010-09-27 19:55 data
...

(I've included only files begin with "b" and "c/C")

Well, let us run:
ale...@ubuntu64:/tmp$ shopt -u nocaseglob; shopt -s nullglob; for a in
[b-c]* ; do echo $a; done
bigstore
clipboardcache
clipboardcache-1
clipboardcache-2
clipboardcache-3
cLnlY6
codeblocks_dbgrpt-30651-20100903T011749
codeblocksIJ8lud
codeblocksWrgEbU
context

-- All as expected.
Another turn:

ale...@ubuntu64:/tmp$ shopt -u nocaseglob; shopt -s nullglob; for a in
[B-C]* ; do echo $a; done
clipboardcache
clipboardcache-1
clipboardcache-2
clipboardcache-3
cLnlY6
Code::Blocks-alexey
codeblocks_dbgrpt-30651-20100903T011749
codeblocksIJ8lud
CODEBLOCKS.socket
codeblocksWrgEbU
context

Note: the nocaseglob is unset. Both letters in the range are actually latin.
Here is the prof:
echo "BC" | hd
  42 43 0a  |BC.|

But the range search here is thow out only non-capital "bigstore" from the
listing, and still included both-cased files begin with c/C.
The fact that 'c' is in the range "B-C" looks really strange. I would expect
that this range contains only "B" and C" letters.

However, when directly enumerating the letters, all works ok:

ale...@ubuntu64:/tmp$ shopt -u nocaseglob; shopt -s nullglob; for a in [BC]*
; do echo $a; done
Code::Blocks-alexey
CODEBLOCKS.socket


bash --version
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 

cat /etc/issue
Ubuntu 10.04.1 LTS \n \l

uname -a
Linux ubuntu64 2.6.32-25-generic #44-Ubuntu SMP Fri Sep 17 20:05:27 UTC 2010
x86_64 GNU/Linux


Re: nocaseglob doesn't always work as expected.

2010-10-04 Thread Greg Wooledge
On Mon, Oct 04, 2010 at 08:14:32PM +0700, Alexey Vinogradov wrote:
> ale...@ubuntu64:/tmp$ shopt -u nocaseglob; shopt -s nullglob; for a in
> [B-C]* ; do echo $a; done
> 
> But the range search here is thow out only non-capital "bigstore" from the
> listing, and still included both-cased files begin with c/C.

Locale issue.  You're using a locale that isn't C or POSIX, and in your
locale, there are characters between "B" and "C", and some of them are
probably lowercase.

http://mywiki.wooledge.org/locale for background info.



Re: nocaseglob doesn't always work as expected.

2010-10-04 Thread Bob Proulx
Alexey Vinogradov wrote:
> ale...@ubuntu64:/tmp$ shopt -u nocaseglob; shopt -s nullglob; for a in
> [B-C]* ; do echo $a; done

Since you do not mention your locale setting I assume that you are not
aware of how it affects ranges.  Here if your locale setting uses
dictionary sort ordering then [B-C] is the same as [BcC].

  $ echo b | env -i LC_ALL=en_US.UTF-8 grep '[B-C]'
  $ echo B | env -i LC_ALL=en_US.UTF-8 grep '[B-C]'
  B
  $ echo c | env -i LC_ALL=en_US.UTF-8 grep '[B-C]'
  c
  $ echo C | env -i LC_ALL=en_US.UTF-8 grep '[B-C]'
  C

In the above you can see that lower case c exists in the range B-C but
lower case b does not.

In a locale that sets dictionary sort ordering the collating sequence
is aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ.

[a-z] is the same as [aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYz]
[A-Z] is the same as [AbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ]

You can read more about locales in the online standards documentation.

  
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html#tag_08_02

Personally I set the following in my ~/.bashrc file.

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

Bob



bash inserts semicolon into history when using here-document

2010-10-04 Thread Robert Citek
Configuration Information [Automatically generated, do not change]:
Machine: i486
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i486'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i486-pc-linux-gnu'
-DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash'
-DSHELL -DHAVE_CONFIG_H   -I.  -I../bash -I../bash/include
-I../bash/lib   -g -O2 -Wall
uname output: Linux lucid-unr 2.6.32-25-generic #44-Ubuntu SMP Fri Sep
17 20:26:08 UTC 2010 i686 GNU/Linux
Machine Type: i486-pc-linux-gnu

Bash Version: 4.1
Patch Level: 5
Release Status: release

Description:

When using a here-document at the command line, bash inserts a
semicolon into the history.


Repeat-By:

I type the following at a bash prompt:

$ { cat <