Re: set -x prejudiced; won't smell UTF-8 coffee

2009-04-08 Thread jidanni
> locale variables have pretty clear definitions.  obviously LC_COLLATE wouldnt 
> be relevant here, but LC_MESSAGES certainly would.
Assumptions, assumptions, those happen to be the two C's for me. So let
me override without having to tamper with them please.




Re: set -x prejudiced; won't smell UTF-8 coffee

2009-04-08 Thread Mike Frysinger
On Wednesday 08 April 2009 03:04:15 jida...@jidanni.org wrote:
> > locale variables have pretty clear definitions.  obviously LC_COLLATE
> > wouldnt be relevant here, but LC_MESSAGES certainly would.
>
> Assumptions, assumptions, those happen to be the two C's for me. So let
> me override without having to tamper with them please.

i never said you couldnt override them.  i said the *default behavior* would 
be to try and autodetect whether to enable passthru.

and LC_CTYPE or LC_MESSAGES probably are the best vars to use for 
autodetection:
Determine the locale for the interpretation of sequences of bytes of text data 
as characters (for example, single-byte as opposed to multi-byte characters in 
arguments and input files).

Determine the locale that should be used to affect the format and contents of 
diagnostic messages written to standard error.

gee, that sounds exactly like what you're trying to do ...
-mike


signature.asc
Description: This is a digitally signed message part.


Re: Misleading syntax in manual

2009-04-08 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Reuben Thomas on 4/6/2009 3:57 PM:
> The man page says:
> 
> for name [ in word ] ; do list ; done
> 
> which conflicts with the POSIX syntax definition, given in
> 
> http://www.opengroup.org/onlinepubs/95399/utilities/xcu_chap02.html#tag_02

The corresponding link in POSIX 2008 is:

http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04

although the expository listing in that section is misleading.  The REAL
POSIX definition is given later in the grammar:

http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10_02

for_clause   : For name linebreakdo_group
 | For name linebreak in  sequential_sep do_group
 | For name linebreak in wordlist sequential_sep do_group
do_group : Do compound_list Done   /* Apply rule 6 */
linebreak: newline_list
 | /* empty */


Which means the POSIX-mandated syntax should really be represented as:

for name [in [word...] ;] do
  compound-list
done

I guess we should file a bug report to the Austin group.

> 
> The easiest fix seems to be to put the semicolon above in square
> brackets, making it optional, though this risks giving the impression
> that the syntax
> 
> for i in foo bar;; do
> 
> would be acceptable, when it's not (even by bash). So, you could give
> two explicit definitions:
> 
> for i [;] do list ; done
> 
> and
> 
> for i in word ; do list ; done

You missed word... (the ... is important).  To keep it on one line, I'd
represent the bash syntax as:

for name [ in [ name ... ] ; | ; ] do

to show that bash supports four varints: 'in ;', 'in name... ;', ';', or
blank.

- --
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknclCoACgkQ84KuGfSFAYC+3ACgpuEfKgSslKM2Vnl1sv3i69PW
XnMAmQFfqIhjHgwHlB5q7ExsCsR9hXi4
=bD+a
-END PGP SIGNATURE-




Re: Bash 4 cursor in my prompt

2009-04-08 Thread Greg Wooledge
On Tue, Apr 07, 2009 at 11:04:58PM -0700, Special Sauce wrote:
> 
> [an...@nobby-nobbs ~]$ echo $PS1
> [\[\e[28;1m\...@\h\[ \e[0m\]\w]$
^^^
The space after \[ is not correct.  You're sending a space to the terminal
(or possibly more than one space -- since you didn't quote "$PS1" when
you expanded it, we can't tell), but you're telling bash that it isn't
moving the cursor (because you have \[ before it).

Whether that's causing your problems, I can't say, but it's definitely
not right.




Re: Misleading syntax in manual

2009-04-08 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Eric Blake on 4/8/2009 6:10 AM:
> The corresponding link in POSIX 2008 is:
> 
> http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04
> 
> although the expository listing in that section is misleading.

I spoke too soon.  The online version rendered incorrectly, as:

The format for the for loop is as follows:
for name [ in [word ... ]]do
compound-listdone

But the .pdf rendering is correct:

The format for the for loop is as follows:
for name [ in [word ... ]]
do
compound-list
done

Notice that by placing do on a new line, then deferring to the grammar for
the cases where newline can be replaced by a semicolon, the printed
version has no error after all.

>  The REAL
> POSIX definition is given later in the grammar:
> 
> http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10_02
> 
> for_clause   : For name linebreakdo_group
>  | For name linebreak in  sequential_sep do_group
>  | For name linebreak in wordlist sequential_sep do_group
> do_group : Do compound_list Done   /* Apply rule 6 */
> linebreak: newline_list
>  | /* empty */

And one other important production, which shows that semicolon can only
appear before 'do' if you also had 'in':

sequential_sep   : ';' linebreak
 | newline_list

> 
> 
> Which means the POSIX-mandated syntax should really be represented as:
> 
> for name [in [word...] ;] do
>   compound-list
> done

Hmm.  That three-line representation for POSIX still looks valid.  But I'm
not sure whether I favor the four-line or three-line version.

- --
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknclysACgkQ84KuGfSFAYBwMgCg0nmatLqwGbo2DyR/ENT29n10
6iAAoLauBFgMA4TeOZ1g60rGpikMMRG/
=TL+1
-END PGP SIGNATURE-




str1 < str2 does not respect locale

2009-04-08 Thread wooledg
Configuration Information [Automatically generated, do not change]:
Machine: hppa2.0
OS: hpux10.20
Compiler: /net/appl/gcc-3.3/bin/gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='hppa2.0' 
-DCONF_OSTYPE='hpux10.20' -DCONF_MACHTYPE='hppa2.0-hp-hpux10.20' 
-DCONF_VENDOR='hp' -DLOCALEDIR='/usr/local/share/locale' -DPACKAGE='bash' 
-DSHELL -DHAVE_CONFIG_H -DHPUX   -I.  -I. -I./include -I./lib -I./lib/intl 
-I/var/tmp/bash-4.0/lib/intl  -g -O2
uname output: HP-UX imadev B.10.20 A 9000/785 2008897791 two-user license
Machine Type: hppa2.0-hp-hpux10.20

Bash Version: 4.0
Patch Level: 10
Release Status: release

Description:
Strings compared with [[ string1 < string2 ]] are supposed to use
the current locale, but they don't.

Repeat-By:
imadev:/tmp/greg$ ls
ab  Ac
imadev:/tmp/greg$ [[ ab < Ac ]]; echo $?
1
imadev:/tmp/greg$ locale
LANG=en_US.iso88591
LC_CTYPE="en_US.iso88591"
LC_COLLATE="en_US.iso88591"
LC_MONETARY="en_US.iso88591"
LC_NUMERIC="en_US.iso88591"
LC_TIME=POSIX
LC_MESSAGES="en_US.iso88591"
LC_ALL=

Bash appears to be using "C" sorting (ASCII) here.  The current locale
sorts ab before Ac, as ls shows, but bash reverses this.  The exit
status from the [[ command should have been 0 (true).

This error also occurs in 2.05b and 3.2.48 (the only other versions of
bash I tested).




Re: Misleading syntax in manual

2009-04-08 Thread Chet Ramey
Eric Blake wrote:

> You missed word... (the ... is important).  To keep it on one line, I'd
> represent the bash syntax as:
> 
> for name [ in [ name ... ] ; | ; ] do
> 
> to show that bash supports four varints: 'in ;', 'in name... ;', ';', or
> blank.

I prefer

for name [ [in [word ...] ] ; ] do

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer

Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/




Re: str1 < str2 does not respect locale

2009-04-08 Thread Chet Ramey
> Configuration Information [Automatically generated, do not change]:
> Machine: hppa2.0
> OS: hpux10.20
> Compiler: /net/appl/gcc-3.3/bin/gcc
> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='hppa2.0' 
> -DCONF_OSTYPE='hpux10.20' -DCONF_MACHTYPE='hppa2.0-hp-hpux10.20' 
> -DCONF_VENDOR='hp' -DLOCALEDIR='/usr/local/share/locale' -DPACKAGE='bash' 
> -DSHELL -DHAVE_CONFIG_H -DHPUX   -I.  -I. -I./include -I./lib -I./lib/intl 
> -I/var/tmp/bash-4.0/lib/intl  -g -O2
> uname output: HP-UX imadev B.10.20 A 9000/785 2008897791 two-user license
> Machine Type: hppa2.0-hp-hpux10.20
> 
> Bash Version: 4.0
> Patch Level: 10
> Release Status: release
> 
> Description:
> Strings compared with [[ string1 < string2 ]] are supposed to use
> the current locale, but they don't.

It's a documentation error (but a long-standing one).  The code has
always used strcmp, not strcoll. 

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer

Chet Ramey, ITS, CWRUc...@case.eduhttp://tiswww.tis.case.edu/~chet/




Re: Misleading syntax in manual

2009-04-08 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Chet Ramey on 4/8/2009 6:44 AM:
> I prefer
> 
> for name [ [in [word ...] ] ; ] do

Yes, that looks nice.  Meanwhile, I've raised the html render bug with the
Austin group:
https://www.opengroup.org/sophocles/show_mail.tpl?CALLER=index.tpl&source=L&listname=austin-group-l&id=12056

- --
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkncnY0ACgkQ84KuGfSFAYDlUACgntHWxjkdWBWHMFRrLUi/M/KI
QKoAnArdBgCRYk28urKmjfpCecSKZ6pQ
=sgmt
-END PGP SIGNATURE-




Re: Bash 4 cursor in my prompt

2009-04-08 Thread Special Sauce
On Apr 8, 7:16 am, Greg Wooledge  wrote:
> On Tue, Apr 07, 2009 at 11:04:58PM -0700, Special Sauce wrote:
>
> > [an...@nobby-nobbs ~]$ echo $PS1
> > [\[\e[28;1m\...@\h\[ \e[0m\]\w]$
>
>                     ^^^
> The space after \[ is not correct.  You're sending a space to the terminal
> (or possibly more than one space -- since you didn't quote "$PS1" when
> you expanded it, we can't tell), but you're telling bash that it isn't
> moving the cursor (because you have \[ before it).
>
> Whether that's causing your problems, I can't say, but it's definitely
> not right.

Didn't know that, but, like I said, works fine with bash 3.xx

Here is a minimalist start:

[an...@nobby-nobbs ~/bucket/bash-4.0]$ ./bash --version
GNU bash, version 4.0.17(1)-release (x86_64-unknown-linux-gnu)
...
[an...@nobby-nobbs ~/bucket/bash-4.0]$ ./bash -v
export PS1="[\[\e[28;1m\...@\h \[\e[0m\]\w]\$ "
[an...@nobby-nobbs ~/bucket/bash-4.0]$

Now the cursor is on top of the '/' between bash and bucket.
If I cd to my home directory and try to recreate the problem the
cursor is on top of the '-' in my hostname.
Maybe its always a fixed number of characters before the end of the
prompt... but why would that be happening?


Re: Bash 4 cursor in my prompt

2009-04-08 Thread Special Sauce
On Apr 8, 7:16 am, Greg Wooledge  wrote:
> On Tue, Apr 07, 2009 at 11:04:58PM -0700, Special Sauce wrote:
>
> > [an...@nobby-nobbs ~]$ echo $PS1
> > [\[\e[28;1m\...@\h\[ \e[0m\]\w]$
>
>                     ^^^
> The space after \[ is not correct.  You're sending a space to the terminal
> (or possibly more than one space -- since you didn't quote "$PS1" when
> you expanded it, we can't tell), but you're telling bash that it isn't
> moving the cursor (because you have \[ before it).
>
> Whether that's causing your problems, I can't say, but it's definitely
> not right.

Didn't know that, but, like I said, works fine with bash 3.xx

Here is a minimalist start:

[an...@nobby-nobbs ~/bucket/bash-4.0]$ ./bash --version
GNU bash, version 4.0.17(1)-release (x86_64-unknown-linux-gnu)
...
[an...@nobby-nobbs ~/bucket/bash-4.0]$ ./bash -v
export PS1="[\[\e[28;1m\...@\h \[\e[0m\]\w]\$ "
[an...@nobby-nobbs ~/bucket/bash-4.0]$

Now the cursor is on top of the '/' between bash and bucket.
If I cd to my home directory and try to recreate the problem the
cursor is on top of the '-' in my hostname.
Maybe its always a fixed number of characters before the end of the
prompt... but why would that be happening?


Re: Bash 4 cursor in my prompt

2009-04-08 Thread Special Sauce
On Apr 8, 1:52 am, Mike Frysinger  wrote:
> and what `locale` settings you're using
> -mike
>
>  signature.asc
> < 1KViewDownload

@Mike What do you mean by locale?


Re: str1 < str2 does not respect locale

2009-04-08 Thread Stephane CHAZELAS
2009-04-8, 08:35(-04), Chet Ramey:
>> Configuration Information [Automatically generated, do not change]:
>> Machine: hppa2.0
>> OS: hpux10.20
>> Compiler: /net/appl/gcc-3.3/bin/gcc
>> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='hppa2.0' 
>> -DCONF_OSTYPE='hpux10.20' -DCONF_MACHTYPE='hppa2.0-hp-hpux10.20' 
>> -DCONF_VENDOR='hp' -DLOCALEDIR='/usr/local/share/locale' -DPACKAGE='bash' 
>> -DSHELL -DHAVE_CONFIG_H -DHPUX   -I.  -I. -I./include -I./lib -I./lib/intl 
>> -I/var/tmp/bash-4.0/lib/intl  -g -O2
>> uname output: HP-UX imadev B.10.20 A 9000/785 2008897791 two-user license
>> Machine Type: hppa2.0-hp-hpux10.20
>> 
>> Bash Version: 4.0
>> Patch Level: 10
>> Release Status: release
>> 
>> Description:
>> Strings compared with [[ string1 < string2 ]] are supposed to use
>> the current locale, but they don't.
>
> It's a documentation error (but a long-standing one).  The code has
> always used strcmp, not strcoll. 
[...]

For information,

pdksh, zsh, dash ([), gawk behave like bash (that's a POSIX
conformance bug for gawk).

ksh93, GNU expr behave as the OP expected.

Every POSIX utility that compare strings (sort, comm, join, RE
ranges...) is meant to be locale dependant, so I support ifever
[[ is ever added to POSIX, that will have to be the case as
well for consistency.

-- 
Stéphane


Re: set -x prejudiced; won't smell UTF-8 coffee

2009-04-08 Thread jidanni
Mike Frysinger  writes:
> i never said you couldnt override them.  i said the *default behavior* would 
OK, it's a deal. Now all that's left is for that Chet guy to implement it :-)




Re: Bash 4 cursor in my prompt

2009-04-08 Thread Mike Frysinger
On Wednesday 08 April 2009 10:49:06 Special Sauce wrote:
> On Apr 8, 1:52 am, Mike Frysinger  wrote:
> > and what `locale` settings you're using
>
> @Mike What do you mean by locale?

run `locale` and post the output
-mike


signature.asc
Description: This is a digitally signed message part.


Re: str1 < str2 does not respect locale

2009-04-08 Thread Mike Frysinger
On Wednesday 08 April 2009 08:35:53 Chet Ramey wrote:
> > Configuration Information [Automatically generated, do not change]:
> > Machine: hppa2.0
> > OS: hpux10.20
> > Compiler: /net/appl/gcc-3.3/bin/gcc
> > Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='hppa2.0'
> > -DCONF_OSTYPE='hpux10.20' -DCONF_MACHTYPE='hppa2.0-hp-hpux10.20'
> > -DCONF_VENDOR='hp' -DLOCALEDIR='/usr/local/share/locale' -DPACKAGE='bash'
> > -DSHELL -DHAVE_CONFIG_H -DHPUX   -I.  -I. -I./include -I./lib
> > -I./lib/intl -I/var/tmp/bash-4.0/lib/intl  -g -O2 uname output: HP-UX
> > imadev B.10.20 A 9000/785 2008897791 two-user license Machine Type:
> > hppa2.0-hp-hpux10.20
> >
> > Bash Version: 4.0
> > Patch Level: 10
> > Release Status: release
> >
> > Description:
> > Strings compared with [[ string1 < string2 ]] are supposed to use
> > the current locale, but they don't.
>
> It's a documentation error (but a long-standing one).  The code has
> always used strcmp, not strcoll.

how about change it to strcoll then ?
-mike


signature.asc
Description: This is a digitally signed message part.


Re: Bash 4 cursor in my prompt

2009-04-08 Thread Special Sauce
On Apr 8, 4:42 pm, Mike Frysinger  wrote:
> On Wednesday 08 April 2009 10:49:06 Special Sauce wrote:
>
> > On Apr 8, 1:52 am, Mike Frysinger  wrote:
> > > and what `locale` settings you're using
>
> > @Mike What do you mean by locale?
>
> run `locale` and post the output
> -mike
>
>  signature.asc
> < 1KViewDownload

[an...@nobby-nobbs ~/bucket/bash-4.0]$
locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Whoah


idea: statically-linked "busy-bash"

2009-04-08 Thread Richard Neill

Dear All,

Here's an idea that occurred to me. I'm not sure whether it's a great 
idea, or a really really stupid one, so please feel free to shoot it 
down. Anyway, there are an awful lot of shell scripts where a huge 
number of the coreutils get repeatedly called in separate processes. 
This call-overhead makes the scripts run noticeably slower.


What I'm suggesting is to experimentally build a version of bash which 
has   mv/cp/ls/stat/grep/  all built in. This would be a rather 
bigger binary, (similar to busybox), but might allow much much faster 
execution of long scripts.


A very quick experiment shows that this might be worthwhile:

date;
for ((i=0;i<100;i++)); do echo -n ""; done;
date;
for ((i=0;i<1;i++)); do /bin/echo -n ""; done;
date

Prints:
Thu Apr  9 07:05:19 BST 2009
Thu Apr  9 07:05:30 BST 2009
Thu Apr  9 07:05:47 BST 2009


In other words, 1E6 invocations of the builtin takes about 11
seconds, while 1E4 invocations of the standalone binary
takes 17 seconds. The builtin echo is therefore about
150 times faster.

What do you think?

Richard