gnu.bash.bug usenet interface not working again

2011-10-02 Thread Stephane CHAZELAS
It worked for a while after I reported the issue and then it
stopped again. That's very frustrating.

-- 
Stephane




Re: initialisation bash variables

2011-10-02 Thread Stephane CHAZELAS
2011-08-15, 17:15(+02), Francky Leyn:
> Hello,
>
> if you have a variable, say VAR,
> and you don't assign it a value,
> and afterwards you test it,
> what is the value of $VAR then?
>
> random, or an empty string?
[...]

Upon startup, the shell makes one shell variable per environment
variable whose name is compatible with shell variable names.

So for instance, if bash receives "VAR=foo" in its environemt,
$VAR will expand to foo. If it's passed "1=bar", $1 will not be
affected, and it's the same for a few special variables of the
shell.

If passed "A+B=C" or "=D" for instance, that obviously won't be
mapped to shell variables. Some shells do discard variables from
the environment that can't be mapped to shell variables. That's
not the case of bash.

-- 
Stephane



Re: initialisation bash variables

2011-10-02 Thread Stephane CHAZELAS
2011-08-16, 22:24(+02), Francky Leyn:
[...]
> VAR=FALSE
> # some command line procesing, that can set VAR to "TRUE"
> if [ $VAR = TRUE ]; then
> ...
> fi
>
> Must I effectively write that VAR=FALSE?
> Or will the script work fine without?

Yes, you must write it, because bash may inherit a VAR variable
from the environment like I said (especially when you consider
that all uppercase variables are by convention reserved for
environment variables).

> Also, can't I write the test as
>
> if [ $VAR ]; then
> ...
> fi
[...]

No. That syntax is wrong.

Valid syntaxes are:

if [ "$VAR" != "" ]

if [ -n "$VAR" ]

if [ "$VAR" ]

Or if you want to be extremely portable:

if [ "" != "$VAR" ]
or
if [ "x$VAR" != x ]

Personally, I prefer:

var=false
if ... var=true ...

if "$var"; then
  ...
fi

-- 
Stephane



Re: initialisation bash variables

2011-10-02 Thread Stephane CHAZELAS
2011-08-17, 08:32(+02), Francky Leyn:
> On 8/16/2011 10:53 PM, Stephane CHAZELAS wrote:
>> 2011-08-16, 22:24(+02), Francky Leyn:
>> [...]
>>> VAR=FALSE
>>> # some command line procesing, that can set VAR to "TRUE"
>>> if [ $VAR = TRUE ]; then
>>> ...
>>> fi
>>>
>>> Must I effectively write that VAR=FALSE?
>>> Or will the script work fine without?
>>
>> Yes, you must write it, because bash may inherit a VAR variable
>> from the environment like I said (especially when you consider
>> that all uppercase variables are by convention reserved for
>> environment variables).
>
> 1) So it's a bad idea to use uppercase variables in a script?

Yes, unless you want to export them to the environment of
commands you start in that script.

> 2) If VAR coincides with an environment variable, and in the
> script I change it value, is this then propagated to outside
> the script? Is the environment variable affected?

The environment is a list of strings (by convention of the format
var=value) passed upon executing a command in a fashion exactly
similar to the list of arguments to the command.

In other words, when you execute a command:

cmd arg1 arg2

you pass both a list of arguments ("cmd", "arg1", and "arg2")
and a list of environment variables ("PATH=/bin...",
"VAR=foo"...).

The difference being that the list of arguments is explicit on
the shell command line while the list of environment variables
comes from the remembered list of environment variables that the
shell (or any application that uses the C library and *environ,
putenv(3), setenv(3)) maintains internally (and initialised from
the environment it received when it was executed).

Though the shell allows the syntax:
VAR=value cmd arg1 arg2
to specify environment variables on the command line.

In no circumstance are variable definitions in one process going
to affect the environment of other processes (an exception to
that is the "fish" shell)

-- 
Stephane



Re: initialisation bash variables

2011-10-02 Thread Stephane CHAZELAS
2011-08-18, 04:10(+02), Patrick:
> On 17.08.2011 20:53, Stephane CHAZELAS wrote:
>> 2011-08-17, 08:32(+02), Francky Leyn:
>>> On 8/16/2011 10:53 PM, Stephane CHAZELAS wrote:
>>> 2) If VAR coincides with an environment variable, and in the
>>>  script I change it value, is this then propagated to outside
>>>  the script? Is the environment variable affected?
>>
>> The environment is a list of strings (by convention of the format
>> [...]
>>
>> In no circumstance are variable definitions in one process going
>> to affect the environment of other processes (an exception to
>> that is the "fish" shell)
>>
>
> Could it be that the two of you are not talking about the same thing?
>
> Just for clarity: environment variables (henceforth "the environment") 
> of a process are (is) inherited by its children.

Everything is inherited by children, however upon executing a
command using the execve(2) system call, all the memory is
of a process is reinitialised. What the new command gets passed
along the execve(2) system call (as arguments) is a list of
arguments (argv) and a list of environment variables (envp).

And as I said, by convention (and helped in that way by C
library function wrappers around the execve(2) system call
(execv, execl, system...) that take care of propagating the
environment) that envp is built by the application from the envp
it received when it was executed (in that process or its
parents).

So yes, generally, environment generally is inherited by
commands executed in children processes but also by the current
process.


> Therefore, what *does* happen, is that if Stephane, as in 2), changes 
> VAR in script, the change gets propagated to the scripts *child* processes.

I think it brings confusion to speak of processes here.
Everything is propagated upon a fork() (the system call that
creates a child process), a fork creates an exact same copy of
the current process. The environment is something that concerns
command execution.

As a side note though, that behavior didn't occur in the Bourne
shell. In the bourne shell, you had to explicitely export a
variable (even if the shell received itself in its environment),
for it to be exported to the commands executed by the shell.

$ VAR=foo sh -c 'VAR=bar; env' | grep VAR
VAR=foo
$ VAR=foo sh -c 'VAR=bar; export VAR; env' | grep VAR
VAR=bar

[...]
> But what does of course not happen, is that the change would get 
> propagated to the *parent* process.

Or any other process.

environment changes are propagated to children just like the
rest of the memory and generally to commands executed by the
current process or any of those children

> (What is the "fish" shell ???)

The friendly interactive shell.

http://en.wikipedia.org/wiki/Friendly_interactive_shell

And see http://fishshell.com/user_doc/index.html#variables for
the documentation on the scope of its variables.

-- 
Stephane



Re: Syntax Question...

2011-10-02 Thread Stephane CHAZELAS
2011-08-14, 02:43(+00), Michael Witten:
[...]
>> Please read BashFAQ/006: http://mywiki.wooledge.org/BashFAQ/006
>
> "no force in the universe can put NUL bytes into shell strings usefully"
>
> Ain't that the goddamn Truth!

No, zsh supports NUL bytes in its strings happily. It's even in
the default $IFS. Where NUL bytes can't go, it's in arguments
to commmands, environment variables, filenames... But I can't
see why a shell variable couldn't contain NUL bytes, it's even a
good thing for both those reasons as you can use that character
to safely separate filenames, arguments, env vars... See for
instance the -0 option of many GNU utilities.

-- 
Stephane



Re: initialisation bash variables

2011-10-02 Thread Stephane CHAZELAS
2011-08-18, 12:44(+02), Patrick:
[...]
>> $ VAR=foo sh -c 'VAR=bar; env' | grep VAR
>> VAR=foo
>> $ VAR=foo sh -c 'VAR=bar; export VAR; env' | grep VAR
>> VAR=bar
> Interresting! I do not have the bourne shell installed. Bash tries to 
> mimic it when called as "sh", but it does not produce the "correct" 
> result for your first example. Not that I would mind about that though. 
> Busybox btw. also yields "VAR=bar". Phew... Dash as well. There is no 
> "sh" package in the ubuntu repos. Google also has no quick answer. What 
> kind of "sh" are you actually using if I may ask ?

Yes, that was the behavior of the Bourne shell, it was changed
by the Korn shell and every other Bourne-like shell followed,
and the new behavior is now specified by POSIX.

The Bourne shell is a shell written by Steve Bourne in the late
seventies and is the ancestor of all of nowadays "Bourne-like
shells" (ash, dash, ksh88, ksh93, pdksh, posh, mksh, bash,
zsh...). There have been many variants of the Bourne shell with
modifications added by the various Unix vendors.

It's still found for backward compatibility in some commercial
Unices. Nowaday, "sh" refers to an implementation or another of
a shell that is able to interpret a POSIX script as specified
(just to avoid saying a "POSIX shell" which would be too much of
a shortcut). The Bourne shell is not one of them (for the reason
above and many others). The code of the Bourne shell was
released as opensource  /recently/ as part of OpenSolaris, so
you can now find ports of it to Linux (See heirloom-sh for
instance). You can also run the Bourne shell from UnixV7 (the OS
where it was first released in 1979) in a PDP11 emulator, and
you'll notice a few differences between the two.

See 
http://www.in-ulm.de/~mascheck/bourne/
for a reference on the Bourne shell.

-- 
Stephane



Re: Syntax Question...

2011-10-02 Thread Stephane CHAZELAS
2011-08-17, 08:24(-04), Greg Wooledge:
> On Tue, Aug 16, 2011 at 03:41:19PM -0700, Linda Walsh wrote:
>> Ken Irving wrote:
>> >Maybe this?
>> >today_snaps=( ${snap_prefix} )
>
>>   but as you mention, that will put them into an arraysorry "imprecise
>> terminology" list for me is some number of objects in a string 
>> separated by some
>> separator.
>
> This is an extremely bad idea.  Legacy Bourne shell code from the
> 1980s kind of bad -- from the horrible days before we *had* arrays
> in shells.  How are you going to handle filenames with spaces in them?
> With newlines in them?  With commas in them?  With colons in them?  Tabs?
> DEL characters?  Those are all valid in filenames.  Any delimiter you
> can *put* in a shell string is also a valid character in a filename (or
> at least in a pathname, which eliminates the possibility of using slash).
>

In this code:

today_snaps=( ${snap_prefix} )

With the default value of IFS in bash and without globbing
disabled, the problematic characters are SPC, TAB, NL, *, ?, [
and potentially more if you have extended globbing enabled.

If $snap_prefix is meant to be space delimited, then you can
make it a bit safer by doing:

IFS=" "
set -f
today_snaps=( $snap_prefix )

NL is a good delimited because it's rare in filenames (but are
allowed, so if the data is foreign and security is a concern,
not an option) and you can also pass the list to line-based
(text_ utilities

var='a1
a2 b2'

IFS='
'
set -f
set -- $var

Or a_vars=$(printf '%s\n' "$var" | grep '^a')

-- 
Stephane



[OT] Re: accents

2011-10-02 Thread Stephane CHAZELAS
2011-08-25, 12:19(-07), Linda Walsh:
[...]
> ` Greg Wooledge wrote:
>> On Wed, Aug 24, 2011 at 06:51:32PM -0700, Linda Walsh wrote:
>>   
>>> BTW, Thomas -- what is the Character that comes after 'De' in your
>>> name?  I read it as hex '0xc282c2'  which doesn't seem to be valid unicode.
>>> 
>>
>> RFC 2822 (section 2.2) says that Header Fields in an email must be
>> composed of US-ASCII characters, so there's no telling what sort of
>> problems the multi-byte character in his From: header may be triggering
>> as it passes through various mail transfer agents.
>>   
> Well, on one level, I would agree,
> But on another, RFC 2822 is obviously messed up, since domain names can 
> contain UTF-8 characters..
>
>
> So...um...how does that work?
[...]

See RFC 5335

-- 
Stephane



Re: Using TMOUT as default for read bultin is unwise

2011-10-02 Thread Stephane CHAZELAS
2011-09-14, 09:46(+01), Wheatley, Martin R:
[...]
> Description:
>   The contents of the environment variable TMOUT are used are the
>   default timeout for the read builtin command when reading from
>   a TTY or pipe AND as a shell command input idle time.
>
>   This can lead to random failures of shell scripts
[...]
>   I think the TMOUT should not be overloaded and its use as a default 
> value
>   for the read builtin - especially for sub-shell pipelines is dangerous
>   and should be discontinued otherwise all bash scripts that use the read 
> builtin
>   need to be modified to include TMOUT=0.

That's not the only problematic variable. See also
http://groups.google.com/group/comp.unix.shell/browse_thread/thread/cf7d5147dd829cf9/ef5b5b49a676b99d#ef5b5b49a676b99d

And here is what Geoff Glare from the Austin Group (the body
behind POSIX) had to say when I raised it some time ago:
http://groups.google.com/group/comp.unix.shell/browse_thread/thread/60c3e67919c36d0a/25ab970d275ecdb7#25ab970d275ecdb7

In short: if one sets a TMOUT environment variable, the easy
fix is to tell them: DON'T!

-- 
Stephane



Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.

2011-10-02 Thread Stephane CHAZELAS
2011-09-16, 17:17(-07), William Park:
> 145557834293068928043467566190278008218249525830565939618481
> is awfully big number! :-)

3**2**62 is 3**(2**62), 3**4611686018427387904, not a number you
can represent with 64bits, nor any reasonable number of bits, 
not (3**2)**62.

Certainly not a number that bash arithmetic expansion can handle
not even in floating mode.

Wih zsh:
$ echo $((exp((2**62)*log(3
inf.

$ echo 'e((2^62)*l(3))' | bc -l
Runtime warning (func=e, adr=123): scale too large, set to 2147483647
Fatal error: Out of memory for malloc.

-- 
Stephane



Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.

2011-10-02 Thread Stephane CHAZELAS
2011-09-17, 13:06(+00), Stephane CHAZELAS:
> 2011-09-16, 17:17(-07), William Park:
>> 145557834293068928043467566190278008218249525830565939618481
>> is awfully big number! :-)
>
> 3**2**62 is 3**(2**62), 3**4611686018427387904, not a number you
> can represent with 64bits, nor any reasonable number of bits, 
> not (3**2)**62.
[...]

Sorry, my bad,

3**2**62 is indeed (3**2)**62 in bash and in zsh contrary to
most other places (ksh93, bc, python, gawk, perl, ruby...).

-- 
Stephane



Re: How to match regex in bash? (any character)

2011-10-02 Thread Stephane CHAZELAS
2011-10-1, 14:39(-08), rogerx@gmail.com:
[...]
> I took some time to examine the three regex references:
>
> 1) 
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
> Written more like a technical specification of regex.  Great if your're
> going to be modifying the regex code.  Difficult to follow if you're new,
> looking for info.

One thing to bear in mind is that bash calls a system library to
perform the regexp expansion (except that [*]), so it can't
really document how it's gonna work because it just can't know,
it may differ from system to system. The only thing that is more
or less guaranteed is that all those various implementation
should comply to that specification.

Above is the specification of the POSIX extended regular
expression, so a bash script writer should refer to that
document if he want to write a script for all the systems where
bash might be used.

> 2) regex(7)
> Although it looks good, upon further examination, I start to see run-on
> sentences.  It's more like a reference, which is what a man file should 
> be.
> At the bottom, "AUTHOR - This page was taken from Henry Spencer's regex
> package"

On the few systems where that man page is available, it may or
may not document the extended regular expressions that are
used when calling the regex(3) API (on my system, it doesn't).
Those regular expressions may or may not have extensions over
the POSIX API, and that document may or may not point out which
ones are extensions and which one are not, so a script writer may
be able to refer to that document if he wants his script to work
on that particular system (except that [*]).

> 3) grep(1)
> Section "REGULAR EXPRESSIONS".  At about half the size of regex(7), the
> section clearly explains regex and seems to be easily understandable for a
> person new to regex.

That's another utility that may or may not use the same API, in
the same way as bash or not. You get no warranty whatsoever that
the regexps covered there will be the same as bash's.

[*] actually, bash does some (undocumented) preprocessing on the
regexps, so even the regex(3) reference is misleading here.

For instance, on my system the regex(3) Extended REs support \1
for backreference, \b for word boundary, but when calling
[[ aa =~ (.)\1 ]], bash changes it to [[ aa =~ (.)1 ]] (note
that (.)\1 is not a portable regex as the behavior is
unspecified) bash won't behave as regex(3) documenta on my
system.

Also (and that could be considered a bug), "[\a]" is meant to
match either "\" or "a", but in bash, because of that
preprocessing, it doesn't:

$ bash -c '[[ "\\" =~ [\a] ]]' || echo no
no
$ bash -c '[[ "\\" =~ [\^] ]]' && echo yes
yes

Once that bug is fixed, bash should probably refer to POSIX EREs
(since its preprocessing would disable any extension introduced
by system libraries) rather than regex(3), as that would be more
accurate.

The situation with zsh:
  - it uses the same API as bash (unless the RE_MATCH_PCRE
option is set in which case it uses PCRE regexps)
  - it doesn't do the same preprocessing as bash because...
  - it doesn't implement that confusing business inherited from
ksh whereby quotes RE characters are taken literally.

  So, in zsh
  - [[ aa =~ '(.)\1' ]] works as documented in regex(3) on my
system (but may work differently on other systems as the
behavior is unspecified as per POSIX).
  - [[ '\' =~ '[\a]' ]] works as POSIX specifies
  - after "setopt RE_MATCH_PCRE", one gets a more portable
behavior as there is only one PCRE library (thouh different
versions).

The situation with ksh93:
  - Not POSIX either but a bit more consistent:
$ ksh -c '[[ "\\" =~ [\a] ]]' || echo no
no
$ ksh -c '[[ "\\" =~ [\^] ]]' || echo no
no
  - it implements its own regexps with its own many extensions
which therefore can be and are documented in its man page
but are not common to any other regex (though are mostly a
superset of the POSIX ERE).

-- 
Stephane



Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.

2011-10-02 Thread Stephane CHAZELAS
2011-09-19, 09:27(-04), Chet Ramey:
> On 9/16/11 4:39 PM, Nicolas ARGYROU wrote:
>
>> Bash Version: 4.0
>> Patch Level: 33
>> Release Status: release
>> 
>> Description:
>> The algorithm used to calculate x to the power of y: x**y
>> takes O(y) time which is way too long on systems using 64 bits.
>> Calculating for exemple $((3**2**62)) freezes the shell at
>> argument parsing time.
>> 
>> Repeat-By:
>> bash -c 'echo $((3**2**62))'
>> 
>> Fix:
>> This fix uses an alorithm that takes O(log(y)) time, which is way
>> faster. But it is still about 30 times slower with random numbers
>> than a single multiplication, on 64 bits systems. The fix is written
>> as a C++ template working on any unsigned integer type, and doesn't
>> need any external resource:
>
> Thanks for the report.  This looks like an independent reimplementation of
> the "exponentiation by squaring" method.  I did a little looking around,
> and it's the best algorithm out there.  I used a slightly different but
> equivalent implementation.
[...]

FYI, ksh93 uses pow(3). So does zsh, but only in floating point
mode.

Probably better and more efficient than reinventing the wheel.

-- 
Stephane



Re: List of background processes in a command group, in a pipeline, executed sequentially under certain conditions.

2011-10-02 Thread Stephane CHAZELAS
2011-10-01, 06:54(-05), Dan Douglas:
[...]
> f() {
> local -i x y
> while read -rN1 "x[y++]"; do
> printf '%d ' "${1}" >&2# keep track of which job this is.
> done
> printf "${#x[@]} " # Print the total number of reads by each 
> job.

if you add a echo >&2 "[done $1]" here.
> }
>
> g() {  # Used in ex 6
> f 1 <${1} &
> f 2 <${1}
> }
>
> # This works as I expect, f is backgrounded and two readers of one pipe each 
> get about half the input:
> exincr # 1
>
> read -ra x < <({ f 1 & f 2; } < <(zeros))
> printf '%b\n' "\n${x[@]}\n"
>
> # Equivalent to above, except with piped output. Now f is not backgrounded. 
> One reader consumes all the input:
> exincr # 2
>
> { f 1 & f 2; } < <(zeros) | {

You'll notice that f 1 terminates straight away. And if you do a
strace, you'll notice that bash does a dup2(open("/dev/null"),
0), that is redirecting "f 1"'s stdin to /dev/null.

~$ bash -c '{ cat; } < c | cat'
test
~$ bash -c '{ cat & } < c | cat'
~$ bash -c '{ lsof -ac lsof -d0; } < c | cat'
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
lsof5005 chazelas0r   REG  253,25 58785638 /home/chazelas/c
~$ bash -c '{ lsof -ac lsof -d0 & } < c | cat'
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
lsof5010 chazelas0r   CHR1,3  0t0  973 /dev/null

That behavior is required by POSIX and occurs for ash and pdksh
and its derivatives as well:

POSIX> command1 & [command2 & ... ]
POSIX> 
POSIX>   The standard input for an asynchronous list, before any
POSIX>   explicit redirections are performed, shall be considered to
POSIX>   be assigned to a file that has the same properties as
POSIX>   /dev/null. If it is an interactive shell, this need not
POSIX>   happen. In all cases, explicit redirection of standard input
POSIX>   shall override this activity.

However, I don't know why bash does it only in the "pipe" case.

~$ ash -c '{ lsof -ac lsof -d0 & } < c'
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
lsof5188 chazelas0r   CHR1,3  0t0  973 /dev/null
~$ bash -c '{ lsof -ac lsof -d0 & } < c'
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
lsof5191 chazelas0r   REG  253,25 58785638 /home/chazelas/c

To work around, this <&0 trick seems to work:

~$ bash -c '{ lsof -ac lsof -d0 <&0 & } < c | cat'
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
lsof5247 chazelas0r   REG  253,25 58785638 /home/chazelas/c

-- 
Stephane



Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.

2011-10-02 Thread Stephane CHAZELAS
2011-09-17, 13:39(+00), Stephane CHAZELAS:
> 2011-09-17, 13:06(+00), Stephane CHAZELAS:
>> 2011-09-16, 17:17(-07), William Park:
>>> 145557834293068928043467566190278008218249525830565939618481
>>> is awfully big number! :-)
>>
>> 3**2**62 is 3**(2**62), 3**4611686018427387904, not a number you
>> can represent with 64bits, nor any reasonable number of bits, 
>> not (3**2)**62.
> [...]
>
> Sorry, my bad,
>
> 3**2**62 is indeed (3**2)**62 in bash and in zsh contrary to
> most other places (ksh93, bc, python, gawk, perl, ruby...).

Sorry again,

I was right in the first place,

3**2**62 is 3**(2**62) in bash and zsh like in other shells.

I think I need more sleep...

-- 
Stephane



Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.

2011-10-02 Thread Chet Ramey
On 9/19/11 2:35 PM, Stephane CHAZELAS wrote:

>> Thanks for the report.  This looks like an independent reimplementation of
>> the "exponentiation by squaring" method.  I did a little looking around,
>> and it's the best algorithm out there.  I used a slightly different but
>> equivalent implementation.
> [...]
> 
> FYI, ksh93 uses pow(3). So does zsh, but only in floating point
> mode.

Bash doesn't use floating point.  It does all of its arithmetic in
intmax_t.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-10-02 Thread Chet Ramey
On 10/2/11 3:43 PM, Stephane CHAZELAS wrote:

> [*] actually, bash does some (undocumented) preprocessing on the
> regexps, so even the regex(3) reference is misleading here.

Not really.  The words are documented to undergo quote removal, so
they undergo quote removal.  That turns \1 into 1, for instance.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/