gnu.bash.bug usenet interface not working again
It worked for a while after I reported the issue and then it stopped again. That's very frustrating. -- Stephane
Re: initialisation bash variables
2011-08-15, 17:15(+02), Francky Leyn: > Hello, > > if you have a variable, say VAR, > and you don't assign it a value, > and afterwards you test it, > what is the value of $VAR then? > > random, or an empty string? [...] Upon startup, the shell makes one shell variable per environment variable whose name is compatible with shell variable names. So for instance, if bash receives "VAR=foo" in its environemt, $VAR will expand to foo. If it's passed "1=bar", $1 will not be affected, and it's the same for a few special variables of the shell. If passed "A+B=C" or "=D" for instance, that obviously won't be mapped to shell variables. Some shells do discard variables from the environment that can't be mapped to shell variables. That's not the case of bash. -- Stephane
Re: initialisation bash variables
2011-08-16, 22:24(+02), Francky Leyn: [...] > VAR=FALSE > # some command line procesing, that can set VAR to "TRUE" > if [ $VAR = TRUE ]; then > ... > fi > > Must I effectively write that VAR=FALSE? > Or will the script work fine without? Yes, you must write it, because bash may inherit a VAR variable from the environment like I said (especially when you consider that all uppercase variables are by convention reserved for environment variables). > Also, can't I write the test as > > if [ $VAR ]; then > ... > fi [...] No. That syntax is wrong. Valid syntaxes are: if [ "$VAR" != "" ] if [ -n "$VAR" ] if [ "$VAR" ] Or if you want to be extremely portable: if [ "" != "$VAR" ] or if [ "x$VAR" != x ] Personally, I prefer: var=false if ... var=true ... if "$var"; then ... fi -- Stephane
Re: initialisation bash variables
2011-08-17, 08:32(+02), Francky Leyn: > On 8/16/2011 10:53 PM, Stephane CHAZELAS wrote: >> 2011-08-16, 22:24(+02), Francky Leyn: >> [...] >>> VAR=FALSE >>> # some command line procesing, that can set VAR to "TRUE" >>> if [ $VAR = TRUE ]; then >>> ... >>> fi >>> >>> Must I effectively write that VAR=FALSE? >>> Or will the script work fine without? >> >> Yes, you must write it, because bash may inherit a VAR variable >> from the environment like I said (especially when you consider >> that all uppercase variables are by convention reserved for >> environment variables). > > 1) So it's a bad idea to use uppercase variables in a script? Yes, unless you want to export them to the environment of commands you start in that script. > 2) If VAR coincides with an environment variable, and in the > script I change it value, is this then propagated to outside > the script? Is the environment variable affected? The environment is a list of strings (by convention of the format var=value) passed upon executing a command in a fashion exactly similar to the list of arguments to the command. In other words, when you execute a command: cmd arg1 arg2 you pass both a list of arguments ("cmd", "arg1", and "arg2") and a list of environment variables ("PATH=/bin...", "VAR=foo"...). The difference being that the list of arguments is explicit on the shell command line while the list of environment variables comes from the remembered list of environment variables that the shell (or any application that uses the C library and *environ, putenv(3), setenv(3)) maintains internally (and initialised from the environment it received when it was executed). Though the shell allows the syntax: VAR=value cmd arg1 arg2 to specify environment variables on the command line. In no circumstance are variable definitions in one process going to affect the environment of other processes (an exception to that is the "fish" shell) -- Stephane
Re: initialisation bash variables
2011-08-18, 04:10(+02), Patrick: > On 17.08.2011 20:53, Stephane CHAZELAS wrote: >> 2011-08-17, 08:32(+02), Francky Leyn: >>> On 8/16/2011 10:53 PM, Stephane CHAZELAS wrote: >>> 2) If VAR coincides with an environment variable, and in the >>> script I change it value, is this then propagated to outside >>> the script? Is the environment variable affected? >> >> The environment is a list of strings (by convention of the format >> [...] >> >> In no circumstance are variable definitions in one process going >> to affect the environment of other processes (an exception to >> that is the "fish" shell) >> > > Could it be that the two of you are not talking about the same thing? > > Just for clarity: environment variables (henceforth "the environment") > of a process are (is) inherited by its children. Everything is inherited by children, however upon executing a command using the execve(2) system call, all the memory is of a process is reinitialised. What the new command gets passed along the execve(2) system call (as arguments) is a list of arguments (argv) and a list of environment variables (envp). And as I said, by convention (and helped in that way by C library function wrappers around the execve(2) system call (execv, execl, system...) that take care of propagating the environment) that envp is built by the application from the envp it received when it was executed (in that process or its parents). So yes, generally, environment generally is inherited by commands executed in children processes but also by the current process. > Therefore, what *does* happen, is that if Stephane, as in 2), changes > VAR in script, the change gets propagated to the scripts *child* processes. I think it brings confusion to speak of processes here. Everything is propagated upon a fork() (the system call that creates a child process), a fork creates an exact same copy of the current process. The environment is something that concerns command execution. As a side note though, that behavior didn't occur in the Bourne shell. In the bourne shell, you had to explicitely export a variable (even if the shell received itself in its environment), for it to be exported to the commands executed by the shell. $ VAR=foo sh -c 'VAR=bar; env' | grep VAR VAR=foo $ VAR=foo sh -c 'VAR=bar; export VAR; env' | grep VAR VAR=bar [...] > But what does of course not happen, is that the change would get > propagated to the *parent* process. Or any other process. environment changes are propagated to children just like the rest of the memory and generally to commands executed by the current process or any of those children > (What is the "fish" shell ???) The friendly interactive shell. http://en.wikipedia.org/wiki/Friendly_interactive_shell And see http://fishshell.com/user_doc/index.html#variables for the documentation on the scope of its variables. -- Stephane
Re: Syntax Question...
2011-08-14, 02:43(+00), Michael Witten: [...] >> Please read BashFAQ/006: http://mywiki.wooledge.org/BashFAQ/006 > > "no force in the universe can put NUL bytes into shell strings usefully" > > Ain't that the goddamn Truth! No, zsh supports NUL bytes in its strings happily. It's even in the default $IFS. Where NUL bytes can't go, it's in arguments to commmands, environment variables, filenames... But I can't see why a shell variable couldn't contain NUL bytes, it's even a good thing for both those reasons as you can use that character to safely separate filenames, arguments, env vars... See for instance the -0 option of many GNU utilities. -- Stephane
Re: initialisation bash variables
2011-08-18, 12:44(+02), Patrick: [...] >> $ VAR=foo sh -c 'VAR=bar; env' | grep VAR >> VAR=foo >> $ VAR=foo sh -c 'VAR=bar; export VAR; env' | grep VAR >> VAR=bar > Interresting! I do not have the bourne shell installed. Bash tries to > mimic it when called as "sh", but it does not produce the "correct" > result for your first example. Not that I would mind about that though. > Busybox btw. also yields "VAR=bar". Phew... Dash as well. There is no > "sh" package in the ubuntu repos. Google also has no quick answer. What > kind of "sh" are you actually using if I may ask ? Yes, that was the behavior of the Bourne shell, it was changed by the Korn shell and every other Bourne-like shell followed, and the new behavior is now specified by POSIX. The Bourne shell is a shell written by Steve Bourne in the late seventies and is the ancestor of all of nowadays "Bourne-like shells" (ash, dash, ksh88, ksh93, pdksh, posh, mksh, bash, zsh...). There have been many variants of the Bourne shell with modifications added by the various Unix vendors. It's still found for backward compatibility in some commercial Unices. Nowaday, "sh" refers to an implementation or another of a shell that is able to interpret a POSIX script as specified (just to avoid saying a "POSIX shell" which would be too much of a shortcut). The Bourne shell is not one of them (for the reason above and many others). The code of the Bourne shell was released as opensource /recently/ as part of OpenSolaris, so you can now find ports of it to Linux (See heirloom-sh for instance). You can also run the Bourne shell from UnixV7 (the OS where it was first released in 1979) in a PDP11 emulator, and you'll notice a few differences between the two. See http://www.in-ulm.de/~mascheck/bourne/ for a reference on the Bourne shell. -- Stephane
Re: Syntax Question...
2011-08-17, 08:24(-04), Greg Wooledge: > On Tue, Aug 16, 2011 at 03:41:19PM -0700, Linda Walsh wrote: >> Ken Irving wrote: >> >Maybe this? >> >today_snaps=( ${snap_prefix} ) > >> but as you mention, that will put them into an arraysorry "imprecise >> terminology" list for me is some number of objects in a string >> separated by some >> separator. > > This is an extremely bad idea. Legacy Bourne shell code from the > 1980s kind of bad -- from the horrible days before we *had* arrays > in shells. How are you going to handle filenames with spaces in them? > With newlines in them? With commas in them? With colons in them? Tabs? > DEL characters? Those are all valid in filenames. Any delimiter you > can *put* in a shell string is also a valid character in a filename (or > at least in a pathname, which eliminates the possibility of using slash). > In this code: today_snaps=( ${snap_prefix} ) With the default value of IFS in bash and without globbing disabled, the problematic characters are SPC, TAB, NL, *, ?, [ and potentially more if you have extended globbing enabled. If $snap_prefix is meant to be space delimited, then you can make it a bit safer by doing: IFS=" " set -f today_snaps=( $snap_prefix ) NL is a good delimited because it's rare in filenames (but are allowed, so if the data is foreign and security is a concern, not an option) and you can also pass the list to line-based (text_ utilities var='a1 a2 b2' IFS=' ' set -f set -- $var Or a_vars=$(printf '%s\n' "$var" | grep '^a') -- Stephane
[OT] Re: accents
2011-08-25, 12:19(-07), Linda Walsh: [...] > ` Greg Wooledge wrote: >> On Wed, Aug 24, 2011 at 06:51:32PM -0700, Linda Walsh wrote: >> >>> BTW, Thomas -- what is the Character that comes after 'De' in your >>> name? I read it as hex '0xc282c2' which doesn't seem to be valid unicode. >>> >> >> RFC 2822 (section 2.2) says that Header Fields in an email must be >> composed of US-ASCII characters, so there's no telling what sort of >> problems the multi-byte character in his From: header may be triggering >> as it passes through various mail transfer agents. >> > Well, on one level, I would agree, > But on another, RFC 2822 is obviously messed up, since domain names can > contain UTF-8 characters.. > > > So...um...how does that work? [...] See RFC 5335 -- Stephane
Re: Using TMOUT as default for read bultin is unwise
2011-09-14, 09:46(+01), Wheatley, Martin R: [...] > Description: > The contents of the environment variable TMOUT are used are the > default timeout for the read builtin command when reading from > a TTY or pipe AND as a shell command input idle time. > > This can lead to random failures of shell scripts [...] > I think the TMOUT should not be overloaded and its use as a default > value > for the read builtin - especially for sub-shell pipelines is dangerous > and should be discontinued otherwise all bash scripts that use the read > builtin > need to be modified to include TMOUT=0. That's not the only problematic variable. See also http://groups.google.com/group/comp.unix.shell/browse_thread/thread/cf7d5147dd829cf9/ef5b5b49a676b99d#ef5b5b49a676b99d And here is what Geoff Glare from the Austin Group (the body behind POSIX) had to say when I raised it some time ago: http://groups.google.com/group/comp.unix.shell/browse_thread/thread/60c3e67919c36d0a/25ab970d275ecdb7#25ab970d275ecdb7 In short: if one sets a TMOUT environment variable, the easy fix is to tell them: DON'T! -- Stephane
Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.
2011-09-16, 17:17(-07), William Park: > 145557834293068928043467566190278008218249525830565939618481 > is awfully big number! :-) 3**2**62 is 3**(2**62), 3**4611686018427387904, not a number you can represent with 64bits, nor any reasonable number of bits, not (3**2)**62. Certainly not a number that bash arithmetic expansion can handle not even in floating mode. Wih zsh: $ echo $((exp((2**62)*log(3 inf. $ echo 'e((2^62)*l(3))' | bc -l Runtime warning (func=e, adr=123): scale too large, set to 2147483647 Fatal error: Out of memory for malloc. -- Stephane
Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.
2011-09-17, 13:06(+00), Stephane CHAZELAS: > 2011-09-16, 17:17(-07), William Park: >> 145557834293068928043467566190278008218249525830565939618481 >> is awfully big number! :-) > > 3**2**62 is 3**(2**62), 3**4611686018427387904, not a number you > can represent with 64bits, nor any reasonable number of bits, > not (3**2)**62. [...] Sorry, my bad, 3**2**62 is indeed (3**2)**62 in bash and in zsh contrary to most other places (ksh93, bc, python, gawk, perl, ruby...). -- Stephane
Re: How to match regex in bash? (any character)
2011-10-1, 14:39(-08), rogerx@gmail.com: [...] > I took some time to examine the three regex references: > > 1) > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 > Written more like a technical specification of regex. Great if your're > going to be modifying the regex code. Difficult to follow if you're new, > looking for info. One thing to bear in mind is that bash calls a system library to perform the regexp expansion (except that [*]), so it can't really document how it's gonna work because it just can't know, it may differ from system to system. The only thing that is more or less guaranteed is that all those various implementation should comply to that specification. Above is the specification of the POSIX extended regular expression, so a bash script writer should refer to that document if he want to write a script for all the systems where bash might be used. > 2) regex(7) > Although it looks good, upon further examination, I start to see run-on > sentences. It's more like a reference, which is what a man file should > be. > At the bottom, "AUTHOR - This page was taken from Henry Spencer's regex > package" On the few systems where that man page is available, it may or may not document the extended regular expressions that are used when calling the regex(3) API (on my system, it doesn't). Those regular expressions may or may not have extensions over the POSIX API, and that document may or may not point out which ones are extensions and which one are not, so a script writer may be able to refer to that document if he wants his script to work on that particular system (except that [*]). > 3) grep(1) > Section "REGULAR EXPRESSIONS". At about half the size of regex(7), the > section clearly explains regex and seems to be easily understandable for a > person new to regex. That's another utility that may or may not use the same API, in the same way as bash or not. You get no warranty whatsoever that the regexps covered there will be the same as bash's. [*] actually, bash does some (undocumented) preprocessing on the regexps, so even the regex(3) reference is misleading here. For instance, on my system the regex(3) Extended REs support \1 for backreference, \b for word boundary, but when calling [[ aa =~ (.)\1 ]], bash changes it to [[ aa =~ (.)1 ]] (note that (.)\1 is not a portable regex as the behavior is unspecified) bash won't behave as regex(3) documenta on my system. Also (and that could be considered a bug), "[\a]" is meant to match either "\" or "a", but in bash, because of that preprocessing, it doesn't: $ bash -c '[[ "\\" =~ [\a] ]]' || echo no no $ bash -c '[[ "\\" =~ [\^] ]]' && echo yes yes Once that bug is fixed, bash should probably refer to POSIX EREs (since its preprocessing would disable any extension introduced by system libraries) rather than regex(3), as that would be more accurate. The situation with zsh: - it uses the same API as bash (unless the RE_MATCH_PCRE option is set in which case it uses PCRE regexps) - it doesn't do the same preprocessing as bash because... - it doesn't implement that confusing business inherited from ksh whereby quotes RE characters are taken literally. So, in zsh - [[ aa =~ '(.)\1' ]] works as documented in regex(3) on my system (but may work differently on other systems as the behavior is unspecified as per POSIX). - [[ '\' =~ '[\a]' ]] works as POSIX specifies - after "setopt RE_MATCH_PCRE", one gets a more portable behavior as there is only one PCRE library (thouh different versions). The situation with ksh93: - Not POSIX either but a bit more consistent: $ ksh -c '[[ "\\" =~ [\a] ]]' || echo no no $ ksh -c '[[ "\\" =~ [\^] ]]' || echo no no - it implements its own regexps with its own many extensions which therefore can be and are documented in its man page but are not common to any other regex (though are mostly a superset of the POSIX ERE). -- Stephane
Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.
2011-09-19, 09:27(-04), Chet Ramey: > On 9/16/11 4:39 PM, Nicolas ARGYROU wrote: > >> Bash Version: 4.0 >> Patch Level: 33 >> Release Status: release >> >> Description: >> The algorithm used to calculate x to the power of y: x**y >> takes O(y) time which is way too long on systems using 64 bits. >> Calculating for exemple $((3**2**62)) freezes the shell at >> argument parsing time. >> >> Repeat-By: >> bash -c 'echo $((3**2**62))' >> >> Fix: >> This fix uses an alorithm that takes O(log(y)) time, which is way >> faster. But it is still about 30 times slower with random numbers >> than a single multiplication, on 64 bits systems. The fix is written >> as a C++ template working on any unsigned integer type, and doesn't >> need any external resource: > > Thanks for the report. This looks like an independent reimplementation of > the "exponentiation by squaring" method. I did a little looking around, > and it's the best algorithm out there. I used a slightly different but > equivalent implementation. [...] FYI, ksh93 uses pow(3). So does zsh, but only in floating point mode. Probably better and more efficient than reinventing the wheel. -- Stephane
Re: List of background processes in a command group, in a pipeline, executed sequentially under certain conditions.
2011-10-01, 06:54(-05), Dan Douglas: [...] > f() { > local -i x y > while read -rN1 "x[y++]"; do > printf '%d ' "${1}" >&2# keep track of which job this is. > done > printf "${#x[@]} " # Print the total number of reads by each > job. if you add a echo >&2 "[done $1]" here. > } > > g() { # Used in ex 6 > f 1 <${1} & > f 2 <${1} > } > > # This works as I expect, f is backgrounded and two readers of one pipe each > get about half the input: > exincr # 1 > > read -ra x < <({ f 1 & f 2; } < <(zeros)) > printf '%b\n' "\n${x[@]}\n" > > # Equivalent to above, except with piped output. Now f is not backgrounded. > One reader consumes all the input: > exincr # 2 > > { f 1 & f 2; } < <(zeros) | { You'll notice that f 1 terminates straight away. And if you do a strace, you'll notice that bash does a dup2(open("/dev/null"), 0), that is redirecting "f 1"'s stdin to /dev/null. ~$ bash -c '{ cat; } < c | cat' test ~$ bash -c '{ cat & } < c | cat' ~$ bash -c '{ lsof -ac lsof -d0; } < c | cat' COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof5005 chazelas0r REG 253,25 58785638 /home/chazelas/c ~$ bash -c '{ lsof -ac lsof -d0 & } < c | cat' COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof5010 chazelas0r CHR1,3 0t0 973 /dev/null That behavior is required by POSIX and occurs for ash and pdksh and its derivatives as well: POSIX> command1 & [command2 & ... ] POSIX> POSIX> The standard input for an asynchronous list, before any POSIX> explicit redirections are performed, shall be considered to POSIX> be assigned to a file that has the same properties as POSIX> /dev/null. If it is an interactive shell, this need not POSIX> happen. In all cases, explicit redirection of standard input POSIX> shall override this activity. However, I don't know why bash does it only in the "pipe" case. ~$ ash -c '{ lsof -ac lsof -d0 & } < c' COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof5188 chazelas0r CHR1,3 0t0 973 /dev/null ~$ bash -c '{ lsof -ac lsof -d0 & } < c' COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof5191 chazelas0r REG 253,25 58785638 /home/chazelas/c To work around, this <&0 trick seems to work: ~$ bash -c '{ lsof -ac lsof -d0 <&0 & } < c | cat' COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof5247 chazelas0r REG 253,25 58785638 /home/chazelas/c -- Stephane
Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.
2011-09-17, 13:39(+00), Stephane CHAZELAS: > 2011-09-17, 13:06(+00), Stephane CHAZELAS: >> 2011-09-16, 17:17(-07), William Park: >>> 145557834293068928043467566190278008218249525830565939618481 >>> is awfully big number! :-) >> >> 3**2**62 is 3**(2**62), 3**4611686018427387904, not a number you >> can represent with 64bits, nor any reasonable number of bits, >> not (3**2)**62. > [...] > > Sorry, my bad, > > 3**2**62 is indeed (3**2)**62 in bash and in zsh contrary to > most other places (ksh93, bc, python, gawk, perl, ruby...). Sorry again, I was right in the first place, 3**2**62 is 3**(2**62) in bash and zsh like in other shells. I think I need more sleep... -- Stephane
Re: Bug fix for $((x**y)) algorithm on 64+ bits machines.
On 9/19/11 2:35 PM, Stephane CHAZELAS wrote: >> Thanks for the report. This looks like an independent reimplementation of >> the "exponentiation by squaring" method. I did a little looking around, >> and it's the best algorithm out there. I used a slightly different but >> equivalent implementation. > [...] > > FYI, ksh93 uses pow(3). So does zsh, but only in floating point > mode. Bash doesn't use floating point. It does all of its arithmetic in intmax_t. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
On 10/2/11 3:43 PM, Stephane CHAZELAS wrote: > [*] actually, bash does some (undocumented) preprocessing on the > regexps, so even the regex(3) reference is misleading here. Not really. The words are documented to undergo quote removal, so they undergo quote removal. That turns \1 into 1, for instance. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/