builtin exit status on write failure

2007-04-29 Thread Eric
Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: cygwin
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='i686' 
-DCONF_OSTYPE='cygwin' -DCONF_MACHTYPE='i686-pc-cygwin' -DCONF_VENDOR='pc' 
-DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H 
-DRECYCLES_PIDS   -I.  -I/home/eblake/bash-3.2.15-14/src/bash-3.2 
-I/home/eblake/bash-3.2.15-14/src/bash-3.2/include 
-I/home/eblake/bash-3.2.15-14/src/bash-3.2/lib   -O2 -pipe 
uname output: CYGWIN_NT-5.1 LOUNGE 1.5.24(0.156/4/2) 2007-01-31 10:57 i686 
Cygwin
Machine Type: i686-pc-cygwin

Bash Version: 3.2
Patch Level: 15
Release Status: release

Description:
POSIX requires any application that writes to stdout to detect
write failure, and exit with non-zero status as well as write
a diagnostic to stderr.

Repeat-By:
One example of a failure to follow this rule, using Linux's
/dev/full to provoke a write failure:

$ cd /bin
$ cd
$ cd - >/dev/full
$ echo $?
0
$ pwd
/bin

Oops - there was a write failure; yet no error message
printed, the exit status remained zero, and the working
directory changed.

Fix:
All of the bash builtins that write to stdout need to check
for ferror before completion, and change the exit status and
print a message accordingly.


___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Re: [Patch] .gitignore TAGS and tags

2021-03-15 Thread Eric Blake
On 3/15/21 3:42 PM, Chet Ramey wrote:
> On 3/15/21 3:57 PM, Mike Jonkmans wrote:
>> On Mon, Mar 15, 2021 at 11:23:46AM -0400, Chet Ramey wrote:
>>> On 3/15/21 3:29 AM, Mike Jonkmans wrote:
>>>> I assume that the TAGS and tags files will not go into the repo.
>>>
>>> Why not? This is only the devel branch; they don't go into releases.
>>
>> Adding tags/TAGS to the repo would increase its size for almost no use.
>> Creating the tags file takes less than a second.
> 
> The size is inconsequential.
> 
>> Drawback of not having these in the repo and not in .gitignore
>> is that a 'git status' complains about untracked files.
> 
> OK, this is a good reason.

But even if the upstream repo doesn't want to ignore a file in the
(checked-in) .gitignore, you can always edit your (local-only)
.git/info/exclude to exclude your extra files locally.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




zsh style associative array assignment bug

2021-03-27 Thread Eric Cook

Hey,

When doing an assignment with an uneven number of elements bash currently 
silently treat the last element
as a key and assigns it an empty string.

$ typeset -A ary=(this feature came from zsh); typeset -p ary
declare -A ary=([came]="from" [this]="feature" [zsh]="" )

In zsh this is an error,
% typeset -A ary=(this feature came from zsh); typeset -p ary
zsh: bad set of key/value pairs for associative array

Could bash be adjusted to align with zsh in this case?




Re: zsh style associative array assignment bug

2021-03-27 Thread Eric Cook

On 3/28/21 12:25 AM, Oğuz wrote:


Why? I think it's better this way.


--
Oğuz



1) For consistency sake with the shell the idea was borrowed from mostly.
2) Prior to this extension bash required specifying the key and value for
AA assignments, so it seems weird to silently ignore that a value wasn't given 
now.
2.5) I subjectively think passing an odd number of elements to declare is more 
often
than not to be a mistake that the user would be interested in knowing about.

With the way it is now, you could save a few characters to do a seen array.
$ while read -r key; ... seen+=("$key") ...
but not really much else.



Re: zsh style associative array assignment bug

2021-03-28 Thread Eric Cook

On 3/28/21 7:02 AM, Oğuz wrote:

As it should be. `[bar]' doesn't qualify as an assignment without an equals 
sign, the shell thinks you're mixing two forms of associative array assignment 
there.

In the new form, that a key is listed inside a compound assignment alone 
implies that it was meant to be assigned a value. In my mind, `a=(foo 123 bar)' 
translates to `a=([foo]=123 [bar]=)'. It makes sense.


That is the point that i am making, in typeset -A ary=([key]=) an explicit 
empty string is the value, but in the case of typeset -A ary=([key]) it was 
historically an error. So why should an key without a value now be acceptable?





Re: zsh style associative array assignment bug

2021-03-29 Thread Eric Cook

On 3/29/21 5:18 PM, Chet Ramey wrote:

If you look at

a=( k1 v1 k2 v2 k3 v3)

as more or less syntactic sugar for

a=( [k1]=v1 [k2]=v2 [k3]=v3 )

it's reasonable that

a=( k1 v1 k2 )

is equivalent to

a=( [k1]=v1 [k2]= ). And that's what bash does.




Its just when populating that array dynamically with another array
if that second array didn't contain `v1' hypothetically, the array gets shifted 
to

a=( [k1]=k2 [v2]=k3 [v3]= )
which i would imagine to be unexpected for the author of the code and would 
rather
it error out instead of chugging along.



Re: zsh style associative array assignment bug

2021-03-30 Thread Eric Cook
On 3/30/21 10:54 AM, Chet Ramey wrote:
> On 3/29/21 6:40 PM, Eric Cook wrote:

>> Its just when populating that array dynamically with another array
>> if that second array didn't contain `v1' hypothetically, the array gets 
>> shifted to
>
> OK, how would you do that? What construct would you use in this scenario?
>
Sample input:
$ exiftool -j *.flac | jq -r '.[]| {Artist, Track, Genre, Title}|to_entries[]| 
.key + "|" + .value'
Artist|AK420
Track|
Genre|lofi
Title|A2 - Northern Lights

--
typeset -A tags=(); set --
while IFS='|' read -ra ary; do
  set -- "$@" "${ary[@]}"
done < <(
  exiftool -j *.flac |
  jq -r '.[]| {Artist, Track, Genre, Title}|to_entries[]| .key + "|" + .value'
)
eval 'tags=('"${*@Q}"\)
typeset -p tags
declare -A tags=([lofi]="Title" [Track]="Genre" [Artist]="AK420" ["A2 - 
Northern Lights"]="" )

>> a=( [k1]=k2 [v2]=k3 [v3]= )
>> which i would imagine to be unexpected for the author of the code and would 
>> rather
>> it error out instead of chugging along.
>
> Wouldn't this be a programming error? If this were a concern, since any
> array can have elements with empty values, I would recommend a different
> strategy to copy it.
>

Yeah, it is a programming error that could've used better validation.
I just find it weird that the assumption of an assignment with an odd number of 
elements with
this new syntax is that the odd number element is always a key missing a value 
that is filled in.
when any of the keys or values could've been missing during the assignment.



Re: zsh style associative array assignment bug

2021-03-31 Thread Eric Cook

On 3/30/21 3:44 PM, Chet Ramey wrote:

Is this a serious piece of code, or just one to demonstrate a programming
error?

The latter


There is only one field, terminated by `|', which becomes one array
element. This is where you `lose' the null elements, not when you attempt
to copy. Nothing you do after it matters.



I wasn't trying to imply that there ever was an element that was `lost'.
(or code golf for the bystanders trying to optimize) just that the author
of an script can make an incorrect assumption on input and eventually try
to pass that along to an AA assignment that bash happily accepts and makes
an opinionated assumption of it's own, guessing at the author's intent.
bash's assumption can be equally as wrong compared to erroring out, telling
the user that something is wrong here.



Your point that the bash method of key-value pair assignment doesn't
protect you from programming errors is valid.



Thank you.



Re: I've found a vulnerability in bash

2021-11-19 Thread Eric Blake
On Fri, Nov 19, 2021 at 03:56:21PM +, Kerin Millar wrote:
> On Fri, 19 Nov 2021 10:05:39 -0500
> Marshall Whittaker  wrote:
> 
> > Fair. I'm not saying anyone has to change it, but I will call out what I
> > think is a design flaw.  But this is going to turn into some philosophical
> > discussion as to whether it should have been done this way from the start.
> > That I don't know, and hold no responsibility for, as I'm not a bash dev,
> > I'm an exploit dev.  Maybe an asshole too.
> 
> You appear to be missing the implication; it has nothing in particular to do 
> with bash. Consider the following Perl program. At no point is a shell 
> involved.
> 
> @args = glob('*');
> system('rm', '-f', @args); # bad

I had to double-check you via 'strace -f -e execve ...', but you are
right, for this particular example.  But according to 'perldoc -f
system', there ARE instances where perl's system() involves a shell:

   Note that argument processing varies depending on the
number of arguments. If there is more than one argument in LIST,
or if LIST is an array with more than one value, starts the
program given by the first element of the list with arguments
given by the rest of the list. If there is only one scalar
argument, the argument is checked for shell metacharacters, and
if there are any, the entire argument is passed to the system's
command shell for parsing (this is "/bin/sh -c" on Unix

although /bin/sh is not always bash.  But that brings up a bug in perl(1):

$ strace -f -e execve perl -e 'system("echo \$HOME")'
execve("/usr/bin/perl", ["perl", "-e", "system(\"echo \\$HOME\")"], 
0x7ffc3e642e58 /* 72 vars */) = 0
strace: Process 1248831 attached
[pid 1248831] execve("/bin/sh", ["sh", "-c", "echo $HOME"], 0x55d3099d69d0 /* 
72 vars */) = 0
/home/eblake
[pid 1248831] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1248831, 
si_uid=14986, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++

According to POSIX, perl should REALLY be passing a "--" argument
between "-c" and the scalar string given by the user; see
https://www.austingroupbugs.net/view.php?id=1440

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Integer overflow of i in string_extract_verbatim

2023-04-28 Thread Eric Li
From: Eric Li 
To: bug-bash@gnu.org
Subject: Integer overflow of i in string_extract_verbatim

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -Og
uname output: Linux fedora 6.2.12-200.fc37.x86_64 #1 SMP
PREEMPT_DYNAMIC Thu Apr 20 23:38:29 UTC 2023 x86_64 x86_64 x86_64
GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.2
Patch Level: 15
Release Status: release

Description:
Bash runs into segmentation fault when spawning a process with
argc larger than 2GB. Can debug using GDB and observe that
subst.c:1204 (string_extract_verbatim, "while (c =
string[i])")
crashes because i = -2147483648. string[i] points to invalid
memory.

Repeat-By:
1. Put the following shell script to a.sh:

A=''
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
A="$A$A$A$A"
set -o pipefail
echo $A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A$A | wc
echo $?
echo done

2. Run "./bash a.sh"
3. See

a.sh: line 15: ... Segmentation fault  (core dumped)

4. Use the following command to debug with GDB

gdb ./bash --ex 'set follow-fork-mode child' --ex 'r a.sh'

5. See GDB output similar to following:

Thread 2.1 "bash" received signal SIGSEGV, Segmentation fault.
... in string_extract_verbatim (...) at subst.c:1204
1204  while (c = string[i])

6. Using GDB, can see that i = -2147483648.

Fix:
In string_extract_verbatim, change "int i" to "size_t i".
Also need to change other places, including:
* Argument sindex of string_extract_verbatim
* Variable sindex of get_word_from_string
* Variable sindex of get_word_from_string
* Argument sindex of string_extract_single_quoted
* ...




RFC: changing printf(1) behavior on %b

2023-08-31 Thread Eric Blake
In today's Austin Group call, we discussed the fact that printf(1) has
mandated behavior for %b (escape sequence processing similar to XSI
echo) that will eventually conflict with C2x's desire to introduce %b
to printf(3) (to produce 0b000... binary literals).

For POSIX Issue 8, we plan to mark the current semantics of %b in
printf(1) as obsolescent (it would continue to work, because Issue 8
targets C17 where there is no conflict with C2x), but with a Future
Directions note that for Issue 9, we could remove %b entirely, or
(more likely) make %b output binary literals just like C.  But that
raises the question of whether the escape-sequence processing
semantics of %b should still remain available under the standard,
under some other spelling, since relying on XSI echo is still not
portable.

One of the observations made in the meeting was that currently, both
the POSIX spec for printf(1) as seen at [1], and the POSIX and C
standard (including the upcoming C2x standard) for printf(3) as seen
at [3] state that both the ' and # flag modifiers are currently
undefined when applied to %s.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
"The format operand shall be used as the format string described in
XBD File Format Notation[2] with the following exceptions:..."

[2] 
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05
"The flag characters and their meanings are: ...
# The value shall be converted to an alternative form. For c, d, i, u,
  and s conversion specifiers, the behavior is undefined.
[and no mention of ']"

[3] https://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
"The flag characters and their meanings are:
' [CX] [Option Start] (The .) The integer portion of the
  result of a decimal conversion ( %i, %d, %u, %f, %F, %g, or %G )
  shall be formatted with thousands' grouping characters. For other
  conversions the behavior is undefined. The non-monetary grouping
  character is used. [Option End]
...
# Specifies that the value is to be converted to an alternative
  form. For o conversion, it shall increase the precision, if and only
  if necessary, to force the first digit of the result to be a zero
  (if the value and precision are both 0, a single 0 is printed). For
  x or X conversion specifiers, a non-zero result shall have 0x (or
  0X) prefixed to it. For a, A, e, E, f, F, g, and G conversion
  specifiers, the result shall always contain a radix character, even
  if no digits follow the radix character. Without this flag, a radix
  character appears in the result of these conversions only if a digit
  follows it. For g and G conversion specifiers, trailing zeros shall
  not be removed from the result as they normally are. For other
  conversion specifiers, the behavior is undefined."

Thus, it appears that both %#s and %'s are available for use for
future standardization.  Typing-wise, %#s as a synonym for %b is
probably going to be easier (less shell escaping needed).  Is there
any interest in a patch to coreutils or bash that would add such a
synonym, to make it easier to leave that functionality in place for
POSIX Issue 9 even when %b is repurposed to align with C2x?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-08-31 Thread Eric Blake
On Thu, Aug 31, 2023 at 03:10:58PM -0400, Chet Ramey wrote:
> On 8/31/23 11:35 AM, Eric Blake wrote:
> > In today's Austin Group call, we discussed the fact that printf(1) has
> > mandated behavior for %b (escape sequence processing similar to XSI
> > echo) that will eventually conflict with C2x's desire to introduce %b
> > to printf(3) (to produce 0b000... binary literals).
> > 
> > For POSIX Issue 8, we plan to mark the current semantics of %b in
> > printf(1) as obsolescent (it would continue to work, because Issue 8
> > targets C17 where there is no conflict with C2x), but with a Future
> > Directions note that for Issue 9, we could remove %b entirely, or
> > (more likely) make %b output binary literals just like C.
> 
> I doubt I'd ever remove %b, even in posix mode -- it's already been there
> for 25 years.

But the longer that printf(3) supports "%b" to output binary values,
the more surprised new shell coders will be that printf(1) %b does not
behave the same.  What's more, other languages have already started
using %b for binary output (python, for example), so it is definitely
gaining in mindshare.

That said, I also agree with your desire to keep the functionality in
place.  The current POSIX says that %b was added so that on a non-XSI
system, you could do:

my_echo() {
  printf %b\\n "$*"
}

and then call my_echo everywhere that a script used to depend on XSI
echo (perhaps by 'alias echo=my_echo' with aliases enabled), for a
much quicker portability hack than a tedious search-and-replace of
every echo call that requires manual inspection of its arguments for
translation of any XSI escape sequences into printf format
specifications.  In particular, code like [var='...\c'; echo "$var"]
cannot be changed to use printf by a mere s/echo/printf %s\\n/.  Thus,
when printf was invented and standardized for the shell, the solution
at the time was to create [printf %b\\n "$var"] as a drop-in
replacement for XSI [echo "$var"], even for platforms without XSI
echo.

Nowadays, I personally have not seen very many scripts like this in
the wild (for example, autoconf scripts prefer to directly use printf,
rather than trying to shoe-horn behavior into echo).  But assuming
such legacy scripts still exist, it is still much easier to rewrite
just the my_echo wrapper to now use %#s\\n instead of %b\\n, than it
would be to find every callsite of my_echo.

Bash already has shopt -s xpg_echo; I could easily see this being a
case where you toggle between the old or new behavior of %b (while
keeping %#s always at the old behavior) by either this or some other
shopt in bash, so that newer script writers that want binary output
for %b can do so with one setting, while scripts that must continue to
run under old semantics can likewise do so.

> 
> > But that
> > raises the question of whether the escape-sequence processing
> > semantics of %b should still remain available under the standard,
> > under some other spelling, since relying on XSI echo is still not
> > portable.
> > 
> > One of the observations made in the meeting was that currently, both
> > the POSIX spec for printf(1) as seen at [1], and the POSIX and C
> > standard (including the upcoming C2x standard) for printf(3) as seen
> > at [3] state that both the ' and # flag modifiers are currently
> > undefined when applied to %s.
> 
> Neither one is a very good choice, but `#' is the better one. It at least
> has a passing resemblence to the desired functionality.

Indeed, that's what the Austin Group settled on today after I first
wrote my initial email, and what I wrote up in a patch to GNU
Coreutils (https://debbugs.gnu.org/65659)

> 
> Why not standardize another character, like %B? I suppose I'll have to look
> at the etherpad for the discussion. I think that came up on the mailing
> list, but I can't remember the details.

Yes, https://austingroupbugs.net/view.php?id=1771 has a good
discussion of the various ideas.

%B is out for the same reason as %b: although the current C2x draft
wording says that % is reserved for implementation use, other
than [AEFGX] which already have a history of use by C (as it was, when
C99 added %A, that caused problems for some folks), it goes on to
_highly_ encourage any implementation that adds %b for "0b0" binary
output also add %B for "0B0" binary output (to match the x/X
dichotomy).  Burning %B to retain the old behavior while repurposing
%b to output lower-case binary values is thus a non-starter, while
burning %#s (which C says is undefined) felt nicer.

The Austin Group also felt that standardizing bash's behavior of %q/%Q
for outputting quoted text, while too late for Issue 8, has a good
chance of success, even though

Re: [PATCH] printf: add %#s alias to %b

2023-08-31 Thread Eric Blake
On Thu, Aug 31, 2023 at 04:01:17PM -0500, Rob Landley wrote:
> On 8/31/23 13:31, Eric Blake wrote:
> > POSIX Issue 8 will be obsoleting %b (escape sequence interpolation) so
> > that future Issue 9 can change to having %b (binary literal output)
> > that aligns with C2x.
> 
> I.E. you sent an RFC to that effect to the posix list earlier today, and so 
> far
> the only reply on the posix list was the bash maintainer, who said "I doubt 
> I'd
> ever remove %b, even in posix mode -- it's already been there for 25 years."

The RFC to the POSIX list was started earlier than today
(https://austingroupbugs.net/view.php?id=1771 was filed on Aug 7, not
by me; and by Aug 8 we had already identified the future conflict with
C2x %b).  But you are right that today was the first time I widened
the audience by mailing coreutils and bash (rather than just the few
developers that follow the POSIX mailing list).  There are also plans
to ask the same question of other shell developers (dash, BSD,
busybox, ...); but I figured I'd start with the people and code I know
best.

It's not hard to see why POSIX is choosing to have Issue 8 obsoleting
(not removing) %b's old semantics; in the short term, nothing about %b
changes, so your dusty-deck shell scripts will continue to work as
they have before; but you now have enough time to update your scripts.
The question is whether Issue 9 (several years down the road) will be
able to repurpose %b to mean binary literal output (only possible if
all shell authors agree that C2X compatibility is worth it), or must
instead just undocument %b altogether (shells can always provide
extensions that POSIX doesn't document - and the obvious extensions in
that case would be a shell's choice of %b with the old semantics or %b
to do binary literals).

But if POSIX _is_ able to repurpose %b (because enough shell authors
agree that binary output is more useful these days than XSI echo
compatibility), the followon question is whether there should be a
portable way to access the old functionality.  Since %#s is currently
unspecified, we are trying to guage feedback of how many
implementations are willing to add that alias now, which in turn will
affect whether Issue 9 can mandate that behavior (because everyone
liked it) or must continue to leave it undefined.

But nothing is stopping coreutils from adding %#s as an extension now,
regardless of what input other shell authors provide to the ongoing
POSIX discussion.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Eric Blake
On Fri, Sep 01, 2023 at 08:59:19AM +0100, Stephane Chazelas wrote:
> 2023-08-31 15:02:22 -0500, Eric Blake via austin-group-l at The Open Group:
> [...]
> > The current POSIX says that %b was added so that on a non-XSI
> > system, you could do:
> > 
> > my_echo() {
> >   printf %b\\n "$*"
> > }
> 
> That is dependant on the current value of $IFS. You'd need:
> 
> xsi_echo() (
>   IFS=' '
>   printf '%b\n' "$*"
> )

Let's read the standard in context (Issue 8 draft 3 page 2793 line 92595):

"
The printf utility can be used portably to emulate any of the traditional 
behaviors of the echo
utility as follows (assuming that IFS has its standard value or is unset):
• The historic System V echo and the requirements on XSI implementations in 
this volume of
  POSIX.1-202x are equivalent to:
printf "%b\n" "$*"
"

So yes, the standard does mention the requirement to have a sane IFS,
and I failed to include that in my one-off implementation of
my_echo().  Thank you for pointing out a more robust version.

> 
> Or the other alternatives listed at
> https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo/65819#65819
> 
> [...]
> > Bash already has shopt -s xpg_echo
> 
> Note that in bash, you need both
> 
> shopt -s xpg_echo
> set -o posix
> 
> To get a XSI echo. Without the latter, options are still
> recognised. You can get a XSI echo without those options with:
> 
> xsi_echo() {
>   local IFS=' ' -
>   set +o posix
>   echo -e "$*\n\c"
> }
> 
> The addition of those \n\c (noop) avoids arguments being treated as
> options if they start with -.

As an extension, Bash (and Coreutils) happen to honor \c always, and
not just for %b.  But POSIX only requires \c handling for %b.

And while Issue 8 has taken steps to allow implementations to support
'echo -e', it is still not standardized behavior; so your xsi_echo()
is bash-specific (which is not necessarily a problem, as long as you
are aware it is not portable).

> [...]
> > The Austin Group also felt that standardizing bash's behavior of %q/%Q
> > for outputting quoted text, while too late for Issue 8, has a good
> > chance of success, even though C says %q is reserved for
> > standardization by C. Our reasoning there is that lots of libc over
> > the years have used %qi as a synonym for %lli, and C would be foolish
> > to burn %q for anything that does not match those semantics at the C
> > language level; which means it will likely never be claimed by C and
> > thus free for use by shell in the way that bash has already done.
> [...]
> 
> Note that %q is from ksh93, not bash and is not portable across
> implementations and with most including bash's gives an output
> that is not safe for reinput in arbitrary locales (as it uses
> $'...' in some cases), not sure  it's a good idea to add it to
> the standard, or at least it should come with fat warnings about
> the risk in using it.

%q is NOT being added to Issue 8, but $'...' is.  Bug 1771 asked if %q
could be added to Issue 8, but it came it past the deadline for
feature requests, so the best we could do is add a FUTURE DIRECTIONS
blurb that mentions the idea.  But since FUTURE DIRECTIONS is
non-normative, we can always change our mind in Issue 9 and delete
that text if it turns out we can't get consensus to standardize some
form of %q/%Q after all.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Eric Blake
On Fri, Sep 01, 2023 at 07:19:13AM +0200, Phi Debian wrote:
> Well after reading yet another thread regarding libc_printf() I got to
> admit that even %B is crossed out, (Yet already choosen by ksh93)
> 
> The other thread also speak about libc_printf() documentting %# as
> undefined for things other than  a, A, e, E, f, F, g, and G, yet the same
> thread also talk about a A comming late (citing C99) in the dance, meaning
> what is undefined today become defined tomorow, so %#b is no safer.
>

Caution: The proposal here is for %#s (an alternative string), not %#b
(which C2x wants to be similar to %#x, in that it outputs a '0b'
prefix for all values except bare '0').

Yes, there is a slight risk that C may decide to define %#s.  But as
the Austin Group includes a member of WG14, we are able to advise the
C committee that such an addition is not wise.

> My guess is that printf(1) is now doomed to follow its route, keep its old
> format exception, and then may be implement something like c_printf like
> printf but the format string follow libc semantic, or may be a -C option to
> printf(1)...

Adding an option to printf is also a possibility, if there is
wide-spread implementation practice to standardize.  If someone wants
to implement 'printf -C' right now, that could help feed such a future
standardization.  But it is somewhat orthogonal to the request in this
thread, which is how to allow users to still access the old %b
behavior even if %b gets repurposed in the future; if we can get
multiple implementations to add a %#s alias now, it makes the future
decisions easier (even if it is too late for Issue 8 to add any new
features, or for that matter, to make any normative changes other than
marking %b obsolescent as a way to be able to revisit it in the future
for Issue 9).


> 
> Well in all case %b can not change semantic in the bash script, since it is
> there for so long, even if it depart from python, perl, libc, it is
> unfortunate but that's the way it is, nobody want a semantic change, and on
> next routers update, see the all internet falling appart :-)

How many scripts in the wild actually use %b, though?  And if there
are such scripts, anything we can do to make it easy to do a drop-in
replacement that still preserves the old behavior (such as changing %b
to %#s) is going to be easier to audit than the only other
currently-portable alternative of actually analyzing the string to see
if it uses any octal or \c escapes that have to be re-written to
portably function as a printf format argument.

POSIX is not mandating %#s at this time, so much as suggesting that if
implementations are willing to implement it now, it will make Issue 9
easier to reason about.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH] printf: add %#s alias to %b

2023-09-06 Thread Eric Blake
On Wed, Sep 06, 2023 at 09:03:29AM -0400, Chet Ramey wrote:
> On 9/5/23 10:13 PM, William Bader wrote:
> > Has bash ever had a change before that would break valid scripts?
> 
> Yes, but I try to keep those to a minimum.
> 
> > Could the printf format change be settable by a variable or by an option
> > like the -e/-E in echo?
> 
> It could, sure. Anything is possible.
> 
> > Is it necessary for bash printf to match C printf?
> 
> No. That's the heart of it.
> 
> > I suppose that it is already decided.
> 
> The austin group has decided what they'd like to do, and what they'd like
> implementors to do. The question is whether or not people go along with it.

The Austin Group decided merely:

If we do nothing now for Issue 8, then Issue 9 WILL have a conflict
between printf(1) and printf(3).  If we reach out to all developers
now, we can start the discussion, and then by the time Issue 9 comes
around (several years from now), we may have enough consensus to do
any number of things:

- Do nothing; printf(1) and printf(3) have incompatible %b
- Declare that %b has implementation-defined behavior (shell authors
  have the choice on whether %b has old or new behavior)
- Declare that %b is no longer standardized (but implementations can
  still provide it as an extension, using their choice of behavior)
- Standardize %#s to do the same thing as %b used to do
- Standardize 'printf -c %b 1' which parses its format string
  according to C23 rules (output "0b1"), while 'printf %b 1' remains
  the old way (output "1")
- Your suggestion here (if enough shell writers agree on what to do,
  then Issue 9 can do that)

But for that work, Issue 8 has to do something - it marks %b
obsolescent, merely so that we have the option (not the mandate) to
change its behavior in the future.  It may turn out that there is
enough resistance that the answer is no change to behavior, and we
could even remove the obsolescent tag in Issue 9 (that is, make it
formal that printf(1) and printf(3) intentionally diverge on %b).  But
marking something obsolescent in Issue 8 doesn't require any current
shell to change, while still encouraging the discussion in case they
do want to change.

Adding %#s as a synonym to %b seems easy enough to do, regardless of
what Issue 9 decides to do to %b, so the Austin Group mentioned that
as a non-normative idea in the wording for Issue 8.  But they are not
requiring shell authors to implement it (even though GNU Coreutils has
already expressed willingness to do it in /bin/printf).  Meanwhile,
implementing 'printf -c' to mean "interpret my format string according
to C23 semantics" is also a viable idea, but one not mentioned in the
current incantation of the Austin Group bug.  But that's why the bug
has a 30-day review period, to collect feedback comments on how it can
be better worded before Issue 8 is finalized.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH] printf: add %#s alias to %b

2023-09-06 Thread Eric Blake
On Wed, Sep 06, 2023 at 10:45:09AM +0700, Robert Elz wrote:
> 
> However, my "read of the room" at the minute is that this simply won't
> happen, and printf(1) %b will remain as it is, and not be removed any
> time soon (or probably, ever).   If printf(1) ever really needs a method
> to output in binary, some other mechanism is likely to be found - most
> likely one which gives a choice of output bases, not just base 2.

You (anyone reading this, not just kre) are welcome to join tomorrow's
Austin Group meeting if you would like to add input on how to reword
the changes that will land in Issue 8 as a result of
https://austingroupbugs.net/view.php?id=1771; it is a Zoom call
(details at
https://www.mail-archive.com/austin-group-l@opengroup.org/msg11758.html).
Or you can add comments to the bug directly. I will be on the call,
and if nothing else, my role in the meeting tomorrow will include
summarizing some of the "read of the room" on the feedback received in
this thread (namely, enough shell authors are insistent that printf(1)
and printf(3) should diverge in %b behavior in Issue 9 that trying to
plan otherwise by marking it obsolescent in Issue 8 isn't going to
minimize any pain down the road)

> 
> There's no current harm implementing %#s as an alias for %b - but I see
> no point in anyone using it, it will certainly be far less portable than
> %b for a LONG time.   There's also no guarantee that the C people might not
> find a use in printf(3) for %#s (currently the # there has no meaning) and
> entirely possible that whatever that use is, if it happens, might be more
> useful for printf(1) to follow, than having it mean what %b currently
> means - so going that route really is not a safe long term choice (it
> would be a gamble).

Of course, the gamble is easier to win if we have multiple independent
implementations that have all coordinated to do it the same way, so we
can push back on WG14 to tell them they would be foolish to commandeer
%#s for anything other than what existing practice has.  Coreutils is
willing to do it, but I have not actually committed that patch yet,
waiting to see how this thread pans out.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH] printf: add %#s alias to %b

2023-09-07 Thread Eric Blake
On Thu, Sep 07, 2023 at 11:53:54PM +0700, Robert Elz wrote:
> And for those who have been following this issue, the new text for
> the forthcoming POSIX version has removed any mention of obsoleting
> %b from printf(1) - instead it will simply note that there is will be
> a difference between printf(1) and printf(3) once the latter gets its
> version of %b specified (in C23, and in POSIX, in the next major version
> that follows the coming one, almost certainly) - and to encourage
> implementors to consider possible solutions.
> 
> I've considered, and I don't see a problem needing solving, so I'm
> intending to do precisely nothing, unless someone actually finds a
> need for binary output from printf(1), which seems unlikely to
> ever happen to me (I mean a real need, not just to be the same as printf(3)
> "just because").
> 
> So, we can all go back to sleep now - and Chet, I'd undo $#s before it
> appears in a release, there's no need, and having it might eventually
> just cause more backward compat issues.

Indeed, at this point, even though I proposed a patch for %#s in
coreutils, I'm inclined to NOT apply it there.

The ksh extension of %..2d to output in binary does sound worth
replicating; I wonder if glibc would consider putting that in their
printf(3); and I could see adding it to Coreutils (whether or not bash
adds it - because ksh already has it).

And thanks for pointing out the existing discrepancy with %c; that was
really helpful in today's Austin Group meeting in realizing that
conflicts in Issue 9 regarding %b is not covering new ground.

> 
> And wrt:
>   | I don't know what potential uppercase/lowercase pairs of format specifiers
>   | are free from use in any existing POSIX-like shell, but my suggestion 
> would
> 
> There are none, printf(3) belongs to the C committee, and they can make
> use of anything they like, at any time they like.
> 
> The best we can do is use formats that make no sense for printf(1) to
> support (like %p, which in printf(3) prints a pointer value, but in
> printf(1) there are no (meaningful) pointers that it could ever make
> sense to print, so %p is useless for its printf(3) purpose in printf(1).
> 
> Similarly all the size modifier chars are meaningless for printf(1), as
> all the numeric values it is passed are actually strings - what internal
> format they're converted into is unrelated to anything the printf(1) user
> can control, so none of those size modifiers mean anything to printf(1)
> either (but it seems that many of those have been usurped by various
> printf(1) implementations already, so finding something free that everyone
> could use, isn't easy).

Here, I slightly disagree with you.

Right now, both bash and coreutils' 'printf %hhd 257' outputs "257",
but printf("%hhd", 257) in C outputs 1.  I would LOVE to have a mode
(possibly spelled 'printf -C %hhd 257') where I can ensure that width
modifiers are applied to the integer value obtained from the correct
subsequent argument to printf.

[Side note: since bash also supports 'printf a%n b >/dev/null' as a
convoluted way of accomplishing 'b=1', I wonder if it would be
possible to port https://github.com/carlini/printf-tac-toe which
performs an interactive game of tic-tac-toe in a single printf(3)
statement invoked in a while loop into a single printf(1) command line
invocation. The lack of %hhd implicitly masking with 256 makes it
harder]

That is, if we are thinking about adding 'printf -c' or 'printf -C' as
a way to say "treat my format string as closely to C as possible", we
would be addressing MULTIPLE things at once: %b, %c, %hhd, and any
other (useful) conversion specifier in C.  And given that, I prefer
naming such an extension option -C or -c (implying C-like), rather
than your suggestion of -b (implying binary, but where the implication
only benefits %b) as a better option name for such a printf extension
option.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH] printf: add %#s alias to %b

2023-09-07 Thread Eric Blake
On Thu, Sep 07, 2023 at 02:42:16PM +0700, Robert Elz wrote:
> Date:Wed, 6 Sep 2023 11:32:32 -0500
> From:    Eric Blake 
> Message-ID:  
> 
> 
>   | You (anyone reading this, not just kre) are welcome to join tomorrow's
>   | Austin Group meeting
> 
> Thanks, but I don't expect its time of day will coincide with
> mine this week, at best I would be a half asleep zombie.
> 
>   |  it is a Zoom call
> 
> As best I understand it, zoom does not support NetBSD - which
> is the only platform I use, which has been true for decades now
> (previously I also used SunOS (not Solaris) and Ultrix).
> 
> While probably works on android (ie: phone)  meeting use that
> way would not be convenient for anyone - certainly not for me
> staring at it all the time, and assuming that it works with
> video enabled, not for anyone else with an image moving around
> randomly... (my phone has no stand, I haven't been able to
> find one which fits it).

The meeting is now over, but for clarification, the Austin Group does
audio-only meetings.  Some weeks we use Zoom, some we use Webex
(depends on who is available to run the meeting), but no one is
on-screen, so a POT dialin always works at no disadvantage to someone
unable/unwilling to run Zoom software (whether that be for reasons of
not yet having port available, or for Zoom not releasing their
software under a license acceptable to your liking).  Speak up if you
think the Austin Group is ever unfairly crippling someone's right to
participate by limiting the participation behind a paywall.

> 
>   | Or you can add comments to the bug directly.
> 
> I have done that already, and probably will add one more.
> 
>   | Of course, the gamble is easier to win if we have multiple independent
>   | implementations that have all coordinated to do it the same way, so we
>   | can push back on WG14 to tell them they would be foolish to commandeer
>   | %#s for anything other than what existing practice has.
> 
> Which worked how well with %b ?

As Geoff commented on 1771, if someone had raised the issue about %b
conflicting 6 months sooner, and pointed out the ksh extension of
%..d as an alternative, we may have had time to do so.
https://austingroupbugs.net/view.php?id=1771#c6453

But because the Austin Group learned about the conflict so late in the
game, we were already too late to push back on C2x at the time,
putting us instead into the camp of seeing what consensus we could get
from shell developers.  This thread (and others like it) have been
helpful - we DID get consensus (namely, that printf(1) and printf(3)
have always diverged, so diverging further on %b is okay), and today's
Austin Group meeting we updated what will go into Issue 8 based on
that feedback.

I consider that to be a successful outcome, even if you may have felt
heartburn through the intermediate stages of it all.

> 
> Further, upon reflection, I think a better use of %#s in printf(1)
> (no point in printf(3)) would be to explicity output a string of
> bytes (what %s used to do, before it was reinterpreted to output
> characters instead).   While the two might seem to be mostly the
> same, that depends upon the implementation - if an implementation
> treats strings of characters as arrays of wchar_t, and converts
> from byte encoding to wchar_t on input, there's no guarantee that
> the output (converted back from wchar_t to byte encoding) will be
> identical to the input string.   Sometimes that might not be
> desirable and a method to simply copy the input string to the
> output, as uninterpreted bytes might be useful to have.  To me
> that is a better use of %#s than as a %b clone - particularly
> as %b needs the same kind of variant (%#b).   This also deals
> with the precision issue, %.1s is 1 character fr9m the arg
> string, %#.1s is one byte instead.

That is indeed a cool idea, but one for the libc folks to take up.  At
any rate, I agree that burning %#s to be a synonym for %b precludes
this useful idea (and it may be even more important in shell contexts,
now that Issue 8 has taken efforts to make it clear that sometimes the
shell deals with characters, and sometimes with bytes; in particular,
environment variables can hold bytes that need not always form
characters in the current locale).

> 
> If there were to be anything worthy of announcing as deprecated
> from posix printf(1) it would be %c - then we could make %c be
> compat with its printf(3) meaning, where it takes a codepoint
> as an int (just 8 bits in printf(3) but we don't neet to retain
> that restriction) and outputs the associated character, rather
> than just being an (almost) alias for %.1s -- where the almost
> is because given '' as the arg string, %c is permitted to output
> \0 or nothing, wher

Re: Idea: jobs(1) -i to print only :%ID:s

2023-11-10 Thread Eric Pruitt
On Fri, Nov 10, 2023 at 01:22:54PM -0500, Greg Wooledge wrote:
> It most definitely is *not* everywhere.  It's part of GNU coreutils,
> and is generally not present on any system that does't use those (BSDs
> and commercial Unixes for example).

>From _seq(1)_ on FreeBSD:

> The seq command first appeared in Version 8 AT&T UNIX. A seq command
> appeared in NetBSD 3.0, and was ported to FreeBSD 9.0. This command
> was based on the command of the same name in Plan 9 from Bell Labs and
> the GNU core utilities. The GNU seq command first appeared in the 1.13
> shell utilities release.

>From _seq(1)_ on OpenBsd:

> A seq command appeared in Version 8 AT&T UNIX. This version of seq
> appeared in NetBSD 3.0 and was ported to OpenBSD 7.1.



Re: Bash Bug - Incorrect Printing of Escaped Characters

2023-12-25 Thread Eric Pruitt
On Mon, Dec 25, 2023 at 05:00:37PM -0500, Seth Sabar wrote:
> I'm reaching out to report what I believe to be a bug with the
> *--pretty-print* feature in bash-5.2.

Tangentially, this option doesn't seem to be documented outside of "bash
--help":

$ git clone https://git.savannah.gnu.org/git/bash.git
Cloning into 'bash'...
remote: Counting objects: 41221, done.
remote: Compressing objects: 100% (5024/5024), done.
remote: Total 41221 (delta 36225), reused 41045 (delta 36106)
Receiving objects: 100% (41221/41221), 259.98 MiB | 15.65 MiB/s, done.
Resolving deltas: 100% (36225/36225), done.
$ cd bash/doc/
doc$ fgrep -r pretty
texinfo.tex:% above.  But it's pretty close.
texinfo.tex:  % and a tt hyphen is pretty tiny.  @code also disables ?` !`.
doc$

Eric



completion very slow with gigantic list

2024-01-10 Thread Eric Wong
Hi, I noticed bash struggles with gigantic completion lists
(100k items of ~70 chars each)

It's reproducible with both LANG+LC_ALL set to en_US.UTF-8 and C,
so it's not just locales slowing things down.

This happens on the up-to-date `devel' branch
(commit 584a2b4c9e11bd713030916d9d832602891733d7),
but I first noticed this on Debian oldstable (5.1.4)

strcoll and strlen seem to be at the top of profiles, and
mregister_free when building devel with default options...
ltrace reveals it's doing strlen repeatedly on the entire
(100k items * 70 chars each = ~7MB)

Sidenote: I'm not really sure what one would do with ~100K
completion candidates, but I managed to hit that case when
attempting completion for an NNTP group + IMAP mailbox listing.

Standalone reproducer here:
---8<--
# bash struggles with giant completion list (100K items of ~70 chars each)
# Usage:
#   . giant_complete.bash
#   giant_complete a # watch CPU usage spike
#
# derived from lei-completion.bash in https://80x24.org/public-inbox.git
# There could be something wrong in my code, too, since I'm not
# familiar with writing completions...

_giant_complete() {
# generate a giant list:
local wordlist="$(awk 

Re: completion very slow with gigantic list

2024-01-10 Thread Eric Wong
"Dale R. Worley"  wrote:
> A priori, it isn't surprising.  But the question becomes "What
> algorithmic improvement to bash would make this work faster?" and then
> "Who will write this code?"

I'll try to take a look at it in a few months if I run out of
things to do and nobody beats me to it.  I've already got a lot
on my plate and hit this on my way to other things.



static vs. dynamic scoping

2010-11-09 Thread Eric Blake
On the Austin Group mailing list, David Korn (of ksh93 fame)
complained[1] that bash's 'local' uses dynamic scoping, but that ksh's
'typeset' uses static scoping, and argued that static scoping is saner
since it matches the behavior of declarative languages like C and Java
(dynamic scoping mainly matters in functional languages like lisp):

[1]
https://www.opengroup.org/sophocles/show_mail.tpl?CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=14951

I'm trying to standardize the notion of local variables for the next
revision of POSIX, but before I can do so, I need some feedback on two
general aspects:

1. Implementation aspect:
  How hard would it be to add static scoping to bash?
  Is it something that can be added in addition to dynamic scoping, via
the use of an option to select the non-default mode (for example, 'local
-d' to force dynamic, 'local -s' to force static, and 'local' to go with
default scoping)?
  If both scoping forms are supported, is it worth making the default
scoping dependent on posix compliance (for example, 'local' means
dynamic scoping for 'set +o posix' but static scoping for 'set -o
posix'), or should it be the same default for both modes?

2. User aspect:
  Is anyone aware of a script that intentionally uses the full power of
dynamic scoping available through 'local' which would break if scoping
switched to static?  In particular, I know that the bash-completion
project has fought with local variable scoping issues; would it help or
hurt to switch to static scoping?

Here's a sample shell script that illustrates the difference between the
two scoping methods.

$ ksh -c 'function f1 { typeset a=local; f2; echo $a; };
  function f2 { echo $a; a=changed; };
  a=global; f1; echo $a'
global
local
changed

$ bash --posix -c 'function f1 { typeset a=local; f2; echo $a; };
  function f2 { echo $a; a=changed; };
  a=global; f1; echo $a'
local
changed
global

In static scoping, function f2 does not shadow a declaration of a, so
references to $a within f2 refer to the global variable.  The local
variable a of f1 can only be accessed within f1; the behavior of f2 is
the same no matter how it was reached.

In dynamic scoping, function f2 looks up its call stack for the closest
enclosing scope of a variable named a, and finds the local one declared
in f1.  Therefore, the behavior of f2 depends on how f2 is called.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: bash: Correct usage of F_SETFD

2010-11-22 Thread Eric Blake
On 11/22/2010 03:16 PM, Chet Ramey wrote:
>> include/filecntl.h in bash-4.1 has following:
>>
>> #define SET_CLOSE_ON_EXEC(fd)  (fcntl ((fd), F_SETFD, FD_CLOEXEC))
>>
>> Is that really the correct/intended usage of F_SETFD ?
> 
>  F_SETFDSet the close-on-exec flag associated with fildes to
> the low order bit of arg (0 or 1 as above).
> 
>> If kernel ever adds a new flag to the fd, this would end up clearing the
>> other new flag right ?
>>
>> Shouldn't bash use F_GETFD to get the current flags and set/clear just
>> the FD_CLOEXEC bit ?
> 
> I suppose it would matter if there are systems that have more than one
> flag value.

In practice, there aren't any such systems; but POSIX warns that current
practice is no indicator of future systems, and that read-modify-write
is the only way to use F_SETFD.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: bash: Correct usage of F_SETFD

2010-11-23 Thread Eric Blake
On 11/23/2010 07:42 AM, Matthew Wilcox wrote:
> The POSIX definition can be found here:
> http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html
> 
>> | In practice, there aren't any such systems; but POSIX warns that current
>> | practice is no indicator of future systems, and that read-modify-write
>> | is the only way to use F_SETFD.
>>
>> Yes, that seems to make more sense.
> 
> I think future flags will be created such that they default to off,
> and bash would have to affirmitively set them in order to use them.

Not true.  An implementation can reasonably define a new flag to off for
backwards-compatible behavior, and on for POSIX-compatible behavior, if
there is a case where traditional and POSIX behavior differ.  POSIX
permits additional bits to be on, and in fact requires that applications
leave those additional bits unchanged, in the very case where those
additional bits are essential for maintaining a POSIX-compatible
environment.

> 
> So if bash is the one creating its file descriptors, there's no need to
> use R/M/W since it knows what the state of them are.

No, bash cannot reasonably know what the implementation's default bit
state is, and blindly setting all other bits to zero is very possibly a
bug, and easy enough to avoid by using the full R/M/W.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Why `echo -n hello | while read v; do echo $v; done' prints nothing?

2010-12-02 Thread Eric Blake
On 12/02/2010 04:04 AM, Clark J. Wang wrote:
> Following command also prints nothing, confused :(
> 
> for ((i = 0; i < 10; ++i)); do echo -n " $i"; done | while read v; do echo
> $v; done

http://www.faqs.org/faqs/unix-faq/shell/bash/
FAQ E4.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Why `echo -n hello | while read v; do echo $v; done' prints nothing?

2010-12-02 Thread Eric Blake
On 12/02/2010 07:02 PM, Clark J. Wang wrote:
>> The output from the first command in the pipeline does not end with a
>> newline.  Therefore, 'read' in the second command returns 'failure'
>> (non-zero) when it reads the first line of input, and your loop never
>> iterates.
> 
> But is that reasonable? I think read should return success in this case
> which makes more sense to me. Does the POSIX standards require that?

POSIX requires that the input to read be a text file (and by the
definition of text file in POSIX, it must either be empty or end in a
newline).  By violating POSIX and passing something that does not end in
a newline, you are no longer bound by the rules of POSIX.  Therefore, it
would be a reasonable bash extension that read could return 0 status if
it read data that did not end in a newline, but it would not be a
standard-compliant script that relied on such an extension.  You're
better off supplying the trailing newline, and guaranteeing a compliant
usage.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: argument precedence, output redirection

2010-12-03 Thread Eric Blake
On 12/03/2010 07:46 AM, Payam Poursaied wrote:
> 
> Hi all,
> I'm not sure this is a bug or please let me know the concept:
> What is the difference between:
> ls  -R /etc/ 2>&1 1>/dev/null
> and
> ls -R /etc/ 1>/dev/null 2>&1

POSIX requires that redirections are evaluated from left to right.

The first line duplicates fd 2 from 1 (that is, stderr is now shared
with stdout), then changes fd 1 onto /dev/null (so you've silenced
stdout, and errors from ls will show up on your stderr).

The second line changes fd 1 onto /dev/null, then duplicates fd 2 from 1
(that is, stderr is now shared with /dev/null, and you've silenced all
output to either stream).

> the second one redirect everything to /dev/null but the first one, still
> prints errors (run as a non root user would unveil the problem)
> it the order of arguments important? If yes, what is the idea/concept behind
> this behavior?

Yes the order is important, and the idea behind the behavior is that
left-to-right evaluation order can be easily documented and relied on.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)

2010-12-07 Thread Eric Blake
[adding the Austin Group]

On 12/07/2010 06:19 PM, Chet Ramey wrote:
> On 12/7/10 11:12 AM, Roman Rakus wrote:
>> This one is already reported on coreutils:
>> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574
>>
>> The problem is with numbers higher than /0377; echo and printf consumes all
>> 3 numbers, but it is not 8-bit number. For example:
>> $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610'
>> Should output:
>> 10
>> 10 10 10
>> instead of
>> �
>> � � �
> 
> No, it shouldn't.  This is a terrible idea.  All other shells I tested
> behave as bash does*, bash behaves as Posix specifies, and the bash
> behavior is how C character constants work.  Why would I change this?
> 
> (*That is, consume up to three octal digits and mask off all but the lower
> 8 bits of the result.)

POSIX states for echo:

"\0num Write an 8-bit value that is the zero, one, two, or three-digit
octal number num."

It does not explicitly say what happens if a three-digit octal number is
not an 8-bit value, so it is debatable whether the standard requires at
most an 8-bit value (two characters, \0061 followed by 0) or whether the
overflow is silently ignored (treated as one character \0210), or some
other treatment.

The C99 standard states (at least in 6.4.4.4 of the draft N1256 document):

"The value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined."

leaving '\610' as an implementation-defined character constant.

The Java language specifically requires "\610" to parse as "\061"
followed by "0", and this can be a very useful property to rely on in
this day and age where 8-bit bytes are prevalent.

http://austingroupbugs.net/view.php?id=249 is standardizing $'' in the
shell, and also states:

"\XXX yields the byte whose value is the octal value XXX (one to three
octal digits)"

and while it is explicit that $'\xabc' is undefined (as to whether it
maps to $'\xab'c or to $'\u0abc' or to something else), it does not have
any language talking about what happens when an octal escape does not
fit in a byte.

Personally, I would love it if octal escapes were required to stop
parsing after two digits if the first digit is > 3, but given that C99
leaves it implementation defined, I think we need a POSIX interpretation
to resolve the issue.  Also, I think this report means that we need to
tweak the wording of bug 249 (adding $'') to deal with the case of an
octal escape where three octal digits do not fit in 8 bits (either by
explicitly declaring it unspecified, as is the case with \x escapes; or
by requiring implementation-defined behavior, as in C99; or by requiring
explicit end-of-escape after two digits, as in Java).

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: cd with multiple arguments?

2010-12-09 Thread Eric Blake
On 09/23/2010 10:47 AM, Keith Thompson wrote:
> I'm not sure whether this is a bug (the documentation doesn't address
> this case), but it's at least mildly annoying.
> 
> If you invoke the "cd" commands with extra arguments after the directory
> name, all the extra arguments are silently ignored.

I would really love it if bash copied the zsh behavior here:

cd a b

tries to find a directory matching the extglob

a+([$' \t'])b

and fails if there are multiple matches or no matches.  Certainly less
typing than remembering to quote the argument, via cd 'a b'.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: bash 'let' can give error

2010-12-10 Thread Eric Blake
On 12/09/2010 10:52 AM, Dominic Raferd wrote:
> Description:
> 
> $ val=0; let val++; echo $val,$?; unset val
> 1,1

Not a bug.

> 
> see the error code 1. Setting any other start value (except undefined)
> for val does not produce this error, the problem occurs for let val++
> and let val-- if the start value is 0.

let intentionally returns status 1 if the value was 0; and status > 1 if
there was an error.  Why?  So you can do loops such as:

countdown=10
while let countdown--; do ... ; done

> Why does this happen? Is it 'by design'?

Yes.  The same as for 'expr' which is standardized by POSIX to have the
same behavior.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: bash 'let' can give error

2010-12-10 Thread Eric Blake
On 12/10/2010 08:49 AM, Marc Herbert wrote:
>> let intentionally returns status 1 if the value was 0; and status > 1 if
>> there was an error.  Why?  So you can do loops such as:
>>
>> countdown=10
>> while let countdown--; do ... ; done
>>
>>> Why does this happen? Is it 'by design'?
>>
>> Yes.  The same as for 'expr' which is standardized by POSIX to have the
>> same behavior.
> 
> This is a design mistake: it trades a few characters for a lot of confusion.

It's required for 'expr'.  But since 'let' is a bash extension, bash is
free to change the semantics of 'let' to behave differently.  However,
doing so now would break backwards compatibility with existing scripts
that have come to depend on this behavior, so unfortunately we're stuck
with it.

Remember, non-zero status is NOT always 'failure'; it is the
documentation of each command that will tell you which status values
imply failure.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Precedence of "C operators" and "bash operators"

2010-12-18 Thread Eric Blake
On 12/18/2010 09:22 AM, 12bric wrote:
> 
> Bash 4-1 manual indicates 
>  - "The operators and their precedence, associativity, and values are
> the same as in the C language" (pp. 27).
>  - the precedence of - and + operators is different then the precedence
> of ! and ~.
> 
> But in the book "C a reference manual" (Harbison & Steele), the four
> operators + - ! ~ have the same precedence.

The unary operators + and - have the same precedence as the unary ! and
~; and that level of precedence is higher than that of the binary + and -.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: exit status question

2010-12-20 Thread Eric Blake
On 12/20/2010 11:25 AM, Curtis Doty wrote:
> Not exactly sure if this is a bug. But I don't understand why only the
> first time running ((i++)) returns an error exit status.

Because it follows the same semantics as 'expr', where status 1 is
reserved for a successful run with value 0, and status 2 and above are
reserved for errors.  This was just brought up on the list earlier this
month:
http://lists.gnu.org/archive/html/bug-bash/2010-12/msg00087.html

and seems to be a recurring question:
http://lists.gnu.org/archive/html/bug-bash/2010-07/msg00121.html

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: read builtin does not return success code (0) in spite of it successfully read line

2010-12-27 Thread Eric Blake
On 12/26/2010 01:29 PM, Stephane CHAZELAS wrote:
> Bash behavior is the same as every other shell, is as documented
> and as specified by POSIX.

POSIX requires that the input to read be a text file.  Since you aren't
passing a text file, the behavior is undefined.  POSIX does NOT require
bash to return failure in this case, but neither does it require bash to
return success.  You should not rely on the behavior of read when the
input does not end in a newline.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: read builtin does not return success code (0) in spite of it successfully read line

2010-12-27 Thread Eric Blake
On 12/27/2010 10:59 AM, Stephane CHAZELAS wrote:
> 2010-12-27, 09:43(-07), Eric Blake:
> [...]
>> On 12/26/2010 01:29 PM, Stephane CHAZELAS wrote:
>>> Bash behavior is the same as every other shell, is as documented
>>> and as specified by POSIX.
>>
>> POSIX requires that the input to read be a text file.  Since you aren't
>> passing a text file, the behavior is undefined.  POSIX does NOT require
>> bash to return failure in this case, but neither does it require bash to
>> return success.  You should not rely on the behavior of read when the
>> input does not end in a newline.
> [...]
> 
> From
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html
> 
> SUSv4> EXIT STATUS
> SUSv4>
> SUSv4>  The following exit values shall be returned:
> SUSv4>
> SUSv4>   0
> SUSv4>  Successful completion.
> SUSv4>  >0
> SUSv4>  End-of-file was detected or an error occurred.
> SUSv4>

Also from the standard:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html
STDIN

The standard input shall be a text file.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html
1.4 Utility Description Defaults

When an input file is described as a "text file", the utility produces
undefined results if given input that is not from a text file, unless
otherwise stated.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html
3.395 Text File

A file that contains characters organized into zero or more lines. The
lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
in length, including the  character.

> 
> So I think you can expect a non-zero status here.

All you can expect is that you have undefined results.  Undefined
results include zero status.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Command substitution (backtick) and tab completion

2010-12-31 Thread Eric Blake
On 12/31/2010 09:49 AM, Chris F.A. Johnson wrote:
> Fri, 31 Dec 2010 11:49:26 -0500 (EST) linux system
>>> with default installation settings.
>>>
>>> In an earlier version of bash (3.2), the following works
>>> ls `pwd`/ (expands pwd).
>>>
>>> In bash 4.1 it does not. Am I missing a setting or something?
>>> Thank you.
>>
>> Anybody that can at least verify this? Thanks.
> 
>Yes, it works in 3.2 but not in 4.[012].

Personally, I find bash 3.2 behavior buggy - I _don't_ want `` (or $())
expanded by the mere act of tab-completion, as the command may have side
effects that I am unwilling to have happen more than once when I hit the
final Enter to accept the entire command line.  Yes, that means that
tab-completion is inherently limited when it cannot determine the
context of which directory to look in because the left-hand side of a
path name is hidden by a command substitution, but I'd rather live with
that safety than have arbitrary commands running during completion.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: read builtin and readonly variables

2011-01-04 Thread Eric Blake
On 01/03/2011 11:41 PM, Jan Schampera wrote:
> Hello list,
> 
> 
> the read builtin command, when trying to assign to a readonly variable
> after reading the data, spits an error message. This is fine.
> 
> But the return status is 0. It "always" (down to 2.04 was tested) has
> been like that, and it's like that in upcoming 4.2.
> 
> For me, this doesn't make sense. The read may have been successful, but
> the data is gone. It would make sense to return !=0 here, IMHO.

getopts also suffers from a difference in behavior between shells on
readonly arguments:

$ ksh -c 'readonly foo; getopts a: foo -a blah; echo $?'
ksh[1]: ksh: foo: is read only
$ echo $?
2
$ bash -c 'readonly foo; getopts a: foo -a blah; echo $?'
bash: foo: readonly variable
1

where non-interactive ksh completely exited on an invalid assignment,
but bash merely set $?.

> 
> I also quickly cross-read POSIX, since such weirdness usually comes from
> there ;-) but I didn't see anything obvious.

I couldn't find anything either - the POSIX wording for readonly only
mentions assignment and unset as requiring errors.  I think that's an
unintentional hole in POSIX, though, so I'm going ahead and submitting a
bug report to have readonly also mention read and getopts as being
required to error out on a readonly variable (and given that ksh treats
assignment different than unset on whether a non-interactive shell
exits, the extent of the reaction for getopts and read will probably
have to allow both behaviors).

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: read builtin and readonly variables

2011-01-04 Thread Eric Blake
[adding David Korn for a ksh bug]

On 01/04/2011 08:25 AM, Chet Ramey wrote:
>> getopts also suffers from a difference in behavior between shells on
>> readonly arguments:
>>
>> $ ksh -c 'readonly foo; getopts a: foo -a blah; echo $?'
>> ksh[1]: ksh: foo: is read only
>> $ echo $?
>> 2
>> $ bash -c 'readonly foo; getopts a: foo -a blah; echo $?'
>> bash: foo: readonly variable
>> 1
>>
>> where non-interactive ksh completely exited on an invalid assignment,
>> but bash merely set $?.
> 
> The shell should not exit on an assignment error with getopts, since
> getopts is not a special builtin.

Good point - 'unset' is different than 'getopts' or 'read' when it comes
to special builtin status, and I agree that only special builtins are
allowed to exit a non-interactive shell on an assignment error (POSIX
XBD 2.8.1 Consequences of Shell Errors).  Even worse, neither ksh nor
bash exit the shell on 'readonly foo; unset foo; echo $?', so ksh's
behavior on 'getopts' seems like a ksh bug.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Command substitution (backtick) and tab completion

2011-01-05 Thread Eric Blake
On 01/05/2011 09:57 AM, Greg Wooledge wrote:
> On Wed, Jan 05, 2011 at 08:21:18AM -0800, chengiz wrote:
>> So if I get this right, the only time this is a problem is when the
>> command substitution runs more than once.
> 
> I'd actually characterize it differently: it's unsafe to run arbitrary
> commands during tab completion, because bash doesn't know what those
> commands might do.
> 
>> When does this happen? Not
>> in my "ls `pwd`/" example where the command runs once and
>> replaces itself with its output. Does it only run more than once when
>> the ticks are not complete?
> 
> You might realize you made a mistake, hit Ctrl-U, and start over.  But
> the backticked command has already been executed.
> 
> You might hit ESC # to comment out the command line because you suddenly
> realize that you need to do something else first.  Then you come back to
> it (ESC k k ...), remove the # sign, finish typing the command, and run
> it.  But the backticked command was already executed much earlier than
> you might have wished (two commands ago).

Here's another argument why I feel that completion should NEVER rewrite
your command line (not even to replace `pwd` with its value):

I have been known to do things like this for testing parallel
development trees:

cd ~/dir
run tests via a single command
HOME=/path/to/alternate/
 (to rerun cd)
 (to rerun tests, in a different directory)

As long as tab completion on the cd command preserved the ~, then this
scenario works.  But as soon as completion "helpfully" rewrites ~ into
/path/to/home, it breaks my work flow.  Okay, so that work flow of
abusing the definition of $HOME to swap between parallel work trees
might not be the most common, but I hope it points out to you why
completion should not be in the business of rewriting users commands,
but only appending completions.  And it can be applied to any other
situation (substitute ~/$HOME with your choice of shell variable - if
completion ever rewrites a command line with the expansion of $var
rather than keeping literal $var in place, then you cannot alter $var in
between repetitions of a command - even if completion had to temporarily
expand $var in order to form better context about what I was completing
after the point that $var appeared in my command line).

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: read builtin and readonly variables

2011-01-06 Thread Eric Blake
On 01/04/2011 08:05 AM, Eric Blake wrote:
> I couldn't find anything either - the POSIX wording for readonly only
> mentions assignment and unset as requiring errors.  I think that's an
> unintentional hole in POSIX, though, so I'm going ahead and submitting a
> bug report to have readonly also mention read and getopts as being
> required to error out on a readonly variable (and given that ksh treats
> assignment different than unset on whether a non-interactive shell
> exits, the extent of the reaction for getopts and read will probably
> have to allow both behaviors).

I found some other differences between shells:

$ bash --posix -c 'cd /tmp; readonly PWD; echo $?; cd ..; echo
$?-$PWD-$(pwd)' || echo abort,$?
0
bash: PWD: readonly variable
0-/tmp-/

$ bash -c 'cd /tmp; readonly PWD; echo $?; cd ..; echo $?-$PWD-$(pwd)'
|| echo abort,$?
0
bash: PWD: readonly variable
0-/tmp-/

$ ksh -c 'cd /tmp; readonly PWD; echo $?; cd ..; echo $?-$PWD-$(pwd)' ||
echo abort,$?
0
0-/-/

Bash goes ahead and changes the directory but leaves PWD untouched (PWD
is now inconsistent without warning!) in both posix and bash mode,
whereas ksh (silently) ignores the request to make PWD readonly in the
first place.

Also, both shells abort a non-interactive shell when readonly interferes
with export (but bash only aborts in posix mode):

$ ksh -c 'readonly v; export v=a; echo $?-$a' || echo abort,$?
ksh: line 1: v: is read only
abort,1

$ bash -c 'readonly v; export v=a; echo $?-$a' || echo abort,$?
bash: v: readonly variable
1-

$ bash --posix -c 'readonly v; export v=a; echo $?-$a' || echo abort,$?
bash: v: readonly variable
abort,1

I've gone ahead and filed a POSIX interpretation request:
http://austingroupbugs.net/view.php?id=367

Also, since the next version of POSIX will be mandating changes for cd
(http://austingroupbugs.net/view.php?id=253 adds the new cd -e option to
warn if PWD is inconsistent), the notion of a readonly PWD may affect
how you implement that proposal.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: for; do; done regression ?

2011-01-07 Thread Eric Blake
On 01/07/2011 08:39 AM, Chet Ramey wrote:
> On 1/7/11 10:03 AM, Chet Ramey wrote:
>> On 1/6/11 8:17 PM, Alexander Tiurin wrote:
>>> Hi!
>>>
>>> I ran the command 
>>>
>>> ~$ time for i in `seq 0 1`  ; do echo /o/23/4 | cut -d'/' -f2 ; done
>>>> /dev/null 
>>>
>>> 6 times in a row, and noticed to the increase in execution time:
>>>
>>  [...]
>>>
>>> how to interpret the results?
>>
>> It's hard to say without doing more investigation, but I suspect that the
>> fork time is increasing because the bash binary is growing in size.
>>
>> I'd have to build a version with profiling enabled to tell for sure.
> 
> I built a profiling version of bash-4.2 (without the bash malloc, since
> linux doesn't let you replace malloc when you're profiling), and the
> execution time was dominated by fork: around 55-60% of the time.  That's
> around 10-15 times more than any other function.

Is it time to use posix_spawn() instead of fork() in the cases where we
are spawning external processes?  It doesn't help the fact that we have
to still use fork() for subshells, but as the bash binary grows larger,
posix_spawn() becomes more of a win over fork() by reducing kernel
overhead spent in marking a larger memory footprint as copy-on-write,
when that work is later discarded by an exec().

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


miscompilation at gcc -O2

2011-02-09 Thread Eric Blake
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-redhat-linux-gnu'
-DCONF_VENDOR='redhat' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash'
-DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib  -D_GNU_SOURCE
-DRECYCLES_PIDS  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic
uname output: Linux office 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23
16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-redhat-linux-gnu


Bash Version: 4.1
Patch Level: 7
Release Status: release

Description:
There is a report of bash being miscompiled for cygwin when using gcc
4.3.4 -O2, but succeeding when compiled with -O1:
http://cygwin.com/ml/cygwin/2011-02/msg00230.html

Compiling with -Wextra reveals the culprit:
execute_cmd.c: In function ‘execute_function.clone.2’:
execute_cmd.c:4007:23: warning: variable ‘bash_source_a’ might be
clobbered by ‘longjmp’ or ‘vfork’
execute_cmd.c:4007:39: warning: variable ‘bash_lineno_a’ might be
clobbered by ‘longjmp’ or ‘vfork’
execute_cmd.c: In function ‘execute_in_subshell’:
execute_cmd.c:1296:12: warning: variable ‘tcom’ might be clobbered by
‘longjmp’ or ‘vfork’

POSIX is clear that the value of an automatic variable changed between
setjmp() and the subsequent longjmp() is unspecified unless the variable
is marked volatile, but bash is violating this constraint and modifying
several variables that cannot reliably be restored.  Depending on what
code transformations the compiler makes, this can lead to crashes; in
cygwin's case, it appears that mere execution of a trap return handler
can cause bash to corrupt its own stack.

Repeat-By:
make
rm execute_cmd.o
make CFLAGS='-Wextra -O2'

Fix:
--- execute_cmd.c.orig  2011-02-09 11:53:13.470850670 -0700
+++ execute_cmd.c   2011-02-09 11:53:48.422939088 -0700
@@ -1293,7 +1293,7 @@
   int user_subshell, return_code, function_value, should_redir_stdin,
invert;
   int ois, user_coproc;
   int result;
-  COMMAND *tcom;
+  COMMAND *volatile tcom;

   USE_VAR(user_subshell);
   USE_VAR(user_coproc);
@@ -4004,7 +4004,7 @@
   char *debug_trap, *error_trap, *return_trap;
 #if defined (ARRAY_VARS)
   SHELL_VAR *funcname_v, *nfv, *bash_source_v, *bash_lineno_v;
-  ARRAY *funcname_a, *bash_source_a, *bash_lineno_a;
+  ARRAY *funcname_a, *volatile bash_source_a, *volatile bash_lineno_a;
 #endif
   FUNCTION_DEF *shell_fn;
   char *sfile, *t;


-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: bash 4.2, parameter expansion problem

2011-02-14 Thread Eric Blake
On 02/14/2011 10:51 AM, Juergen Daubert wrote:
> Hello,
> 
> I stumbled over the following while trying to build xterm from sources 
> with bash 4.2:
> 
>  $:~> /bin/sh --version | head -n1
>  GNU bash, version 4.2.0(1)-release (i686-pc-linux-gnu)
>  $:~> /bin/sh
>  sh-4.2$ a="${b:-'/foo/bar'}"
>  sh: bad substitution: no closing `}' in ${b:-'/foo/bar'}
>  sh-4.2$ a="${b:-'bar'}"
>  sh-4.2$ a="${b:-/foo/bar}"
>  sh-4.2$
>  
> looks like bash, when called as sh, doesn't like the / character in 
> single-quoted strings.

Looks like a bug in how bash was modified for trying to obey this new
POSIX rule:

http://austingroupbugs.net/view.php?id=221

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Do more testing before a release?

2011-02-17 Thread Eric Blake
On 02/16/2011 09:51 PM, Clark J. Wang wrote:
> I know little about open source development process (and control?). I just
> don't know where to get the bash code (like CVS, SVN respository) before
> it's released. I think it's better to make it open to more people so
> everyone can help review and test before a stable release.

Unlike most open source projects, Chet has chosen to not expose the
daily repository.  Your only option is to track release candidates, or
ask Chet to join the bash-testers list so you can also have access to
his alpha builds a month or two before the official release.

However, I do agree with your sentiment that if the daily repository
were more open to the public, that it would allow for a wider set of
contributions from other developers.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: typeset -r prevents local variable of same name.

2011-02-17 Thread Eric Blake
On 02/16/2011 08:13 PM, Chet Ramey wrote:
> On 2/13/11 3:17 PM, ste...@syslang.net wrote:
>> Configuration Information [Automatically generated, do not change]:
>> Machine: i386
>> OS: linux-gnu
>> Compiler: gcc
>> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i386' 
>> -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i386-redhat-linux-gnu' 
>> -DCONF_VENDOR='redhat' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' 
>> -DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib  -D_GNU_SOURCE 
>> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -pipe -Wall 
>> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
>> --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic 
>> -fasynchronous-unwind-tables
>> uname output: Linux saturn.syslang.net 2.6.27.41-170.2.117.fc10.i686.PAE #1 
>> SMP Thu Dec 10 10:48:30 EST 2009 i686 athlon i386 GNU/Linux
>> Machine Type: i386-redhat-linux-gnu
>>
>> Bash Version: 3.2
>> Patch Level: 39
>> Release Status: release
>>
>> Description:
>>  First, I already submitted this bug from work, but I didn't
>>  realize that the address I sent from would not be allowed to receive
>>  a response. This address will work fine.
>>
>> If I declare a variable at the top scope using -r, it will prevent me
>> from declaring a local copy in a subroutine. This problem happens in
>> this version of bash as well as in bash4 under Fedora 14.
> 
> This is intentional.  A variable is declared readonly for a reason, and
> readonly variables may not be assigned to.  I don't believe that you
> should be able to use a function to circumvent this.

Consensus on today's Austin Group meeting was that since we are
interested in standardizing local variables (or at least a subset of the
'typeset' special built-in's capabilities), this needs to be uniform
across implementations.  The Austin Group would favor the ability to
create a local read-write variable that shadows a global read-only
variable, which would entail a change to this bash behavior.

[Which reminds me - I still have the action item to propose wording for
getting typeset into the next revision of POSIX]

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: typeset -r prevents local variable of same name.

2011-02-17 Thread Eric Blake
On 02/17/2011 07:48 PM, Chet Ramey wrote:
> Consider a quick, contrived example: an administrator writes a shell
> package (library, set of functions, whatever) that includes, among
> other things, ways to make sure that some other package is invoked with
> a particular set of arguments and environment.  He does this in part by
> declaring some variables readonly.  Programs invoked by this package
> change their behavior depending on the value of environment variables,
> so it's important to the correct operation of this script that the
> variables don't change.  It should be harder to cirvumvent this, possibly
> creating a security hole, than just declaring a shell function with a
> local variable that then calls a public function that expects the variable
> to have some other value.

Ah, so we're back to the debate of static vs. dynamic scoping.  David
Korn is insistent that if POSIX standardizes typeset that only static
scoping be standardized, whereas bash currently only implements dynamic
scoping (but static scoping could be added on top of that, via
appropriate options to typeset).  Overriding statically scoped variables
is not a security risk, but overriding dynamically scoped variables is
asking for problems.  I agree with bash's current implementation
restrictions, given its current scoping rules.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: typeset -r prevents local variable of same name.

2011-02-17 Thread Eric Blake
On 02/17/2011 08:18 PM, Chet Ramey wrote:
> On 2/17/11 10:12 PM, Eric Blake wrote:
>> On 02/17/2011 07:48 PM, Chet Ramey wrote:
>>> Consider a quick, contrived example: an administrator writes a shell
>>> package (library, set of functions, whatever) that includes, among
>>> other things, ways to make sure that some other package is invoked with
>>> a particular set of arguments and environment.  He does this in part by
>>> declaring some variables readonly.  Programs invoked by this package
>>> change their behavior depending on the value of environment variables,
>>> so it's important to the correct operation of this script that the
>>> variables don't change.  It should be harder to cirvumvent this, possibly
>>> creating a security hole, than just declaring a shell function with a
>>> local variable that then calls a public function that expects the variable
>>> to have some other value.
>>
>> Ah, so we're back to the debate of static vs. dynamic scoping.  
> 
> Not really.  The readonly variables could be declared at the global
> scope.  Overriding a global variable can cause the same problem.

With static scoping the ONLY place that sees the local variable override
is the intermediate shell function.  If the intermediate function calls
a public function, that public function will still see the (readonly)
global variable. (Think C or ksh local variables.)

It's only when dynamic scoping is in the mix, where the grandchild
function sees the local variables of the intermediate function instead
of the global variables, where you no longer want to allow overriding
readonly variables. (Think lisp or bash local variables.)

I fail to see how overriding a global variable with a statically scoped
local can cause problems, since that local cannot be viewed outside the
function in a language with static scoping.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Strange bug in arithmetic function

2011-02-22 Thread Eric Blake
On 02/21/2011 02:13 AM, Marcel de Reuver wrote:
> In a bash script I use: $[`date --date='this week' +'%V'`%2] to see if
> the week number is even.
> Only in week 08 the error is: bash: 08: value too great for base
> (error token is "08") the same in week 09, all others are Ok...

08 is an invalid octal number.  Try forcing decimal instead:

$[$(date --date='this week' +'10#%V')%2]

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: How to match pattern in bash?

2011-02-22 Thread Eric Blake
On 02/22/2011 08:24 PM, Peng Yu wrote:
> Suppose that I have a variable $x, I want to test if the content of $x
> match the pattern 'abc*'. If yes, then do something. (The operator ==
> doesn't match patterns, if I understand it correctly.)
> 
> Is there such a build-in feature in bash? Or I have to rely on some
> external program such as perl to test the pattern matching?

case $x in abc*) ... ;; esac

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: bash tab variable expansion question?

2011-02-24 Thread Eric Blake
On 02/24/2011 03:14 PM, Michael Kalisz wrote:
> $ echo $PWD/
> will expand the $PWD variable to your current directory
> 
> while in bash, version 4.2.0(1)-release:
> 
> $ echo $PWD/
> will just escape the $ in front of the $ variable i.e:
> 
> $ echo \$PWD/
> The shell-expand-line (Ctrl-Alt-e) works but before I could use just TAB
> 
> Any hints why? Any way to get the 4.1 behavior in 4.2?
> 
> Can someone confirm... Is this a bug or a feature?

I'm not the developer, but in my mind, this is a welcome feature.
TAB-completion should NOT modify what I typed, and I consider the 4.1
behavior to be the bug.  Consider if I have parallel directory
structures a/c and b/c.  If I do:

d=a
$d/c/test
d=b
$d/c/test

I want to run two different programs.  Now, instead of a one-letter
name, consider that it is something longer, like $HOME.  If typing TAB
expands the variable, instead of keeping it intact, then I can't do:

$HOME/c/t-TAB
$HOME=b
UP-UP-ENTER

to repeat my test in a new directory, since tab completion wiped out
that I want to evaluate $HOME every time.  (The same goes for command
substitution - bash should never pre-maturely lock me in to a single
expansion during tab completion.)

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [bash-bug] no local bash_history created or written to if existing (~/.bash_history

2011-03-08 Thread Eric Blake
On 03/08/2011 11:00 AM, Andreas Schwab wrote:
> "Dr. Werner Fink"  writes:
> 
>> On Tue, Mar 08, 2011 at 12:02:53PM -0500, Chet Ramey wrote:
>>>>
>>>> Does this mean that the attached patch could also not work
>>>> on some systems? Or does this interfere with the readline
>>>> library?
>>>
>>> Since longjmp is not on the list of functions that is safe to call
>>> from a signal handler, yes, that's what it means.  OTOH, this shows
>>> promise as a solution.
>>
>> OK, that means only for systems with HAVE_POSIX_SIGSETJMP
>> defined.  At least this provides a (local) solution here
> 
> sigsetjmp is the same as setjmp.  Both will lead to deadlocks.

sigsetjmp is safe to call from a signal handler _if_ you are in the
handler because of a synchronous signal (such as one generated
internally by raise(), and not generated externally and asynchronously
by kill()), or if you can prove that it was not interrupting any other
non-async-signal-safe function (however, blocking all signals around all
calls to non-async-signal-safe functions is very inefficient).  But
deadlock is indeed possible if an asynchronous SIGHUP occurs while the
malloc() lock is held (if you try to malloc() in the cleanup code, but
siglongjmp() left the middle of earlier code that already held the
malloc() lock, then you indeed have deadlock).  Which is why POSIX does
not list siglongjmp() as an async-signal-safe function, because after
siglongjmp(), you are generally only safe to invoke async-signal-safe
functions, which is no better than invoking those same functions
directly within the signal handler itself in the first place.

Really, the only safe way to handle things like SIGHUP cleanup is to
have the signal handler record that an exception occurred, then have the
main processing loop checking that variable frequently enough to do
cleanup in a reasonable time-frame (possibly by using a pipe-to-self if
the main loop is waiting on select()), where the main loop then
re-raises the signal after doing cleanup at a point where all functions
are safe.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: variable name and its' value are the same characters causes recursion error

2011-03-09 Thread Eric Blake
On 03/09/2011 02:54 PM, Chet Ramey wrote:
>>
>> For example:
>>
>> unset a; declare a="a"; [[ a -lt 3 ]]; echo $?
>> bash: [[: a: expression recursion level exceeded (error token is "a")
>> 1
>>
>> Shouldn't the return code from this expression be 2, rather than 1?
> 
> What does it matter?  Failure is failure.

Except that [[ explicitly documents that 0 and 1 imply a syntactically
valid expression, reserving $?==2 for catastrophic failure.  The
argument here is that infinite recursion is catastrophic and represents
an invalid expression, and should not be confused with $?==1 meaning a
valid but false expression.  Similarly to expr(1) returning 0 and 1
depending on value for success, and > 1 on failure.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Problem with open and rm

2011-03-16 Thread Eric Blake
On 03/16/2011 04:54 AM, Barrie Stott wrote:
> The script that follows is a cut down version of one that came from elsewhere.
> 
> #!/bin/bash
> 
> cp /tmp/x.html /tmp/$$.html
> ls /tmp/$$.html
> [ "$DISPLAY" ] && open /tmp/$$.html
> ls /tmp/$$.html
> rm -f /tmp/$$.html

Instead of passing Safari the name of a temporary file, why not pass it
the name of a temporary pipe?  Does this work any better:

[ "$DISPLAY" ] && open <(cat /tmp/x.html)

at which point there's no temporary file to remove.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Using alias to commands disables its completion

2011-03-24 Thread Eric Blake
On 03/24/2011 10:54 AM, Gustavo Serra Scalet wrote:
> Description:
> When an alias is supplied to a command (e.g: alias c="cd") the
> complete doesn't complete like it used when using the original text,
> without
> using alias (e.g $ c  # returns all files, not just
> directories)
> 
> It doesn't seem to have any technical issue problem to make the
> detection of
> completion also by using alias (as when  is hit the alias are
> also
> interpreted).

The lame answer:

But you can already do this yourself!  Write a shell function around
alias, that calls both 'command alias' to do the real work, as well as
'complete -p' on the first word of new expansion then 'complete ...' on
the new alias name, so that you copy any completion properties tied to
the old name over to the new name.

As long as you define your programmable completions before your aliases
as part of your ~/.bashrc startup sequence, then this works.

Even better, submit that as an enhancement request to the
bash-completion project to have bash-completion provide that wrapper
around alias provided automatically as part of starting up bash-completion.

The answer you sort of wanted:

Yes, it would be nice to patch to bash's completion routines to add an
opt-in ability to check for programmed completion associated with
whatever the alias expanded to, and use that when there is no completion
already associated with the aliased name.  But someone has to write such
a patch.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: eval

2011-05-05 Thread Eric Blake
[adding bug-autoconf to document a NetBSD bug]

On 05/05/2011 07:23 AM, Chet Ramey wrote:
> On 5/4/11 4:40 PM, Rafael Fernandez wrote:
>> I am used to the behavior of sh on NetBSD which does not require
>> enclosing positional parameters. 
> 
> I'd call that a pretty serious incompatibility on the part of ash and its
> descendants (BSD sh, dash, etc.).  There's no good reason that
> 
> set -- a b c d e f g h i j
> echo $10
> 
> should echo `j'.

Also a POSIX violation:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02

"The parameter name or symbol can be enclosed in braces, which are
optional except for positional parameters with more than one digit or
when parameter is followed by a character that could be interpreted as
part of the name."

And worth documenting as a portability pitfall in the autoconf manual.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: eval

2011-05-05 Thread Eric Blake
On 05/05/2011 08:43 AM, Chet Ramey wrote:
>> "The parameter name or symbol can be enclosed in braces, which are
>> optional except for positional parameters with more than one digit or
>> when parameter is followed by a character that could be interpreted as
>> part of the name."
> 
> I agree with this interpretation, but the following sentence can be
> interpreted as placing the burden on the shell programmer:
> 
> "When a positional parameter with more than one digit is specified, the
> application shall enclose the digits in braces (see Parameter Expansion)."
> 
> Still, sh has required the braces since time immemorial.  It makes no
> sense that ash would have done it differently

Additionally from POSIX:

"If the parameter name or symbol is not enclosed in braces, the
expansion shall use the longest valid name (see XBD Name)"

Then XBD Name:
"In the shell command language, a word consisting solely of underscores,
digits, and alphabetics from the portable character set. The first
character of a name is not a digit."

In "$10", 10 is not a name, so the longest name after $ is the empty
string, and in place of a name, we use the single-character symbol 1
instead, such that this MUST be parsed as ${1}0, not as ${10}.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Case insensitivity seems to ignore lower bound of interval

2011-05-05 Thread Eric Blake
On 05/05/2011 10:59 AM, Aharon Robbins wrote:
>> No need.  It's already been attempted in the past.
>> http://sourceware.org/bugzilla/show_bug.cgi?id=12045
>> http://sourceware.org/bugzilla/show_bug.cgi?id=12051
> 
> The first one looks like the main one. Has Ulrich responded to it?

12045 is a documentation request (to at least mention why CEO [collation
element ordering] appears screwy for different locales, and how to
properly write a locale file so that CEO does what you normally expect),
which, to my knowledge, Uli hasn't responded to.

12051 is a request to change away from CEO to any other scheme, and Uli
flat out rejected that, even though POSIX no longer requires CEO.  In
other words, Uli says that the burden is on the locale file writer, and
not on glibc.

But without good documentation on how to write a proper locale file, and
without prods to all of the owners of broken locale files, it's an
uphill battle to get CEO ordering to consistently be useful.  And that
only covers you if you are on a system with glibc which uses CEO
semantics, rather than on any other system where the libc semantics of
range expressions has who-knows-what behavior.

Yes, it would really be nice if all four of bash, gawk, sed, and grep
could agree on the same interpretation of non-C range semantics, and
implement that regardless of the underlying libc behavior.  But what
semantics should we settle on?

It's an age-old problem, with no nice solutions.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Yet another quoting question

2011-05-06 Thread Eric Blake
On 05/06/2011 09:02 AM, Steven W. Orr wrote:
>  4.0.35(1)-release (x86_64-redhat-linux-gnu)
> 
> I have a bunch of arrays, and some of the arrays' values are null or
> might contain spaces.

printf %q is your friend.

$ a[0]=aaa a[1]= a[2]='bbb  ccc'
$ printf '%q ' "${a[@]}"
aaa '' bbb\ \ ccc

> I'd love to see an elegant solution. :-)

OK, so printf %q prefers '' and \ over "", but the end result is an
unambiguous array listing that can be fed back into shell code.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Bash-4.2 patches 9 and 10

2011-05-13 Thread Eric Blake
On 05/13/2011 04:54 AM, Joachim Schmitz wrote:
> Hi folks
> 
> Just found Bash-4.2 patches 9 and 10, strange that they never got
> announced?

Yes they did:

http://lists.gnu.org/archive/html/bug-bash/2011-05/msg00014.html
http://lists.gnu.org/archive/html/bug-bash/2011-05/msg00015.html

Along with an accidental announcement for a missing patch 11:

http://lists.gnu.org/archive/html/bug-bash/2011-05/msg00028.html

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Shell case statements

2011-05-19 Thread Eric Blake
[adding bug-bash]

On 05/16/2011 07:23 PM, Wayne Pollock wrote:
> (While cleaning up the standard for case statement, consider that it is 
> currently
> unspecified what should happen if an error occurs during the expansion of the
> patterns; as expansions may have side-effects, when an error occurs on one
> expansion, should the following patterns be expanded anyway?  Does it depend 
> on
> the error?  It seems reasonable to me that any errors should immediately 
> terminate
> the case statement.)

Well, that's rather all over the place, but yes, it does seem like bash
was the buggiest of the lot, compared to other shells.  Interactively, I
tested:

readonly x=1
case 1 in $((x++)) ) echo hi1 ;; *) echo hi2; esac
echo $x.$?

bash 4.1 printed:
bash: x: readonly variable
hi1
1.0
which means it matched '1' to $((x++)) before reporting the failure
assign to x, and the case statement succeeded.  Changing the first "1"
to any other string printed hi2  (the * case).

zsh printed:
zsh: read-only variable: x
1.0
which means it aborted the case statement before executing any clauses,
but left $? at 0.

ksh printed:
ksh: x: is read only
1.1
which means that both the case statement was aborted, and $? was impacted.

dash printed:
dash: arithmetic expression: expecting primary: "x++"
1.2
so it was like ksh other than choice of error status.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Shell case statements

2011-05-20 Thread Eric Blake
On 05/20/2011 09:33 AM, Chet Ramey wrote:
>> Well, that's rather all over the place, but yes, it does seem like bash
>> was the buggiest of the lot, compared to other shells.  Interactively, I
>> tested:
>>
>> readonly x=1
>> case 1 in $((x++)) ) echo hi1 ;; *) echo hi2; esac
>> echo $x.$?
>>
>> bash 4.1 printed:
>> bash: x: readonly variable
>> hi1
>> 1.0
>> which means it matched '1' to $((x++)) before reporting the failure
>> assign to x, and the case statement succeeded.  Changing the first "1"
>> to any other string printed hi2  (the * case).
> 
> Thanks for the report.  This was an easy fix.  The variable assignment
> error was actually handled correctly, the expression evaluation code
> just didn't pay enough attention to the result.

How about the even simpler:

$ bash -c 'readonly x=5; echo $((x=5))'; echo $?
bash: x: readonly variable
5
0
$

Other shells abort rather than running echo:

$ ksh -c 'readonly x=5; echo $((x=5))'; echo $?
ksh: line 1: x: is read only
1
$ zsh -c 'readonly x=5; echo $((x=5))'; echo $?
zsh:1: read-only variable: x
1
$ dash -c 'readonly x=5; echo $((x=5))'; echo $?
dash: x: is read only
2
$

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: BUG? RFE? printf lacking unicode support in multiple areas

2011-05-20 Thread Eric Blake
On 05/20/2011 02:30 PM, Linda Walsh wrote:
> i.e. it's showing me a 16-bit value: 0x203c, which I thought would be the
> wide-char value for the double-exclamation.  Going from the wchar
> definition
> on NT, it is a 16-bit value.  Perhaps it is different under POSIX? but
> 0x203c taken as 32 bits with 2 high bytes of zeros would seem to specify
> the same codepoint for the Dbl-EXcl.

POSIX allows wchar_t to be either 2-byte or 4-byte, although only a
4-byte wchar_t can properly represent all of Unicode (with 2-byte
wchar_t as on windows or Cygwin, you are inherently restricted from
using any Unicode character larger than 0x if you want to maintain
POSIX compliance).

> 
>> Since there is no way to produce a word containing a NUL character it is
>> impossible to support %lc in any useful way.
> 
> That's annoying.   How can one print out unicode characters
> that are supposed to be 1 char long?

I think you are misunderstanding the difference between wide characters
(exactly one wchar_t per character) and multi-byte characters (1 or more
char [byte] per character).

Unicode can be represented in two different ways.  One way is with wide
characters (every character represents exactly one Unicode codepoint,
and code points < 0x100 have embedded NUL bytes if you view the memory
containing those wchar_t as an array of bytes).  The other way is with
multi-byte encodings, such as UTF-8 (every character occupies a variable
number of bytes, and the only character that can contain an embedded NUL
byte is the NUL character at codepoint 0).

Bash _only_ uses multi-byte characters for input and output.  %lc only
uses wchar_t.  Since wchar_t output is not useful for a shell that does
not do input in wchar_t, that explains why bash printf need not support
%lc.  POSIX doesn't require it, at any rate, but it also doesn't forbid
it as an extension.

> This isn't just a bash problem given how well most of the unix "character"
> utils work with unicode -- that's something that really needs to be solved
> if those character utils are going to continue to be _as useful_ in the
> future.
> Sure they will have their current functionality which is of use in many
> ways, but
> for anyone not processing ASCII text it becomes a problem, but this
> isn't really
> a bash is.

Most utilities that work with Unicode work with UTF-8 (that is, with
multi-byte-characters using variable number of bytes), and NOT with wide
characters (that is, with all characters occupying a fixed width).  But
you can switch between encodings using the iconv(1) utility, so it
shouldn't really be a problem in practice in converting from one
encoding type to another.

> That said, it was my impression that a wchar was 16-bits (at least it
> is on MS.  Is it different under POSIX?

POSIX allows 16-bit wchar_t, but if you have a 16-bit wchar_t, you
cannot support all of Unicode.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: documentation bug re character range expressions

2011-06-03 Thread Eric Blake
On 06/03/2011 10:15 AM, Marcel (Felix) Giannelia wrote:
> Alright -- assuming that for the moment, how does one specify
> [ABCDEFGHIJKL] using [[:upper:]]? This is something that I haven't seen
> documented, and I'm genuinely curious.

[ABCDEFGHIJKL]

If you ever want a subset of [[:upper:]], the _only_ portable ways to
guarantee that you are getting just that subset are to use the C locale
or to spell out the range yourself.

In short, ranges are non-portable in all other locales.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


bug in 'set -n' processing

2011-06-03 Thread Eric Blake
Bash has a bug: ${+} is syntactically invalid, as evidenced by the error
message when running the script, yet using 'set -n' was not able to flag
it as an error.

$ echo $BASH_VERSION
4.2.8(1)-release
$ bash -c 'echo ${+}'; echo $?
bash: ${+}: bad substitution
1
$ bash -cn '${+}'; echo $?
0
$ ksh -cn '${+}'; echo $?
ksh: syntax error at line 1: `+' unexpected
3

Meanwhile, a feature request: since $+ outputs a literal "$+", it is
proof that + cannot be a valid variable name.  Bash should follow ksh'
lead by having 'set -n' warn about suspicious but usable constructs, at
least when --posix is not in effect.

$ bash -c 'echo $+'
$+
$ bash -cn '$+'; echo $?
0
$ ksh -cn '$+'; echo $?
ksh: warning: line 1: $ not preceded by \
0

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: documentation bug re character range expressions

2011-06-03 Thread Eric Blake
On 06/03/2011 11:36 AM, Marcel (Felix) Giannelia wrote:
> It sounds to me like what you're saying is, the *only* uses of bracket
> range expressions guaranteed to be "portable" are things like [[:upper:]]
> and [[:lower:]]. But I put "portable" in quotation marks just then,
> because to my mind the word "portable" implies "has the same behaviour on
> all systems", whereas things like [[:upper:]] are locale-dependent; they
> change their behaviour depending on system settings.

Actually, that _is_ portable, because if you have the _same_ locale on
two different machines, you will get the _same_ locale-dependent
behaviors from those two machines.

But you do have a point - even POSIX admits that different vendors have
varying locale description files, so the "en_US" locale is not
consistent between two machines if the locale definitions were not
written by the same person.

> [0-9] presumably still works consistently across all platforms -- I hope?

[0-9] is a special case.  It is the _ONLY_ range that POSIX requires to
be invariant across all locales, so in practice, you can use this range
expression to your heart's content.  But there is still the POSIX
wording that [0-9] may have implementation-defined behavior for non-C
locales, so you might _still_ be safer using [0123456789] or [[:digit:]].

> 
> I think a good solution to this, then, is to just deprecate the use of "-"
> in bracket expressions entirely.

Which is effectively what POSIX did by stating that range expressions
outside of the C locale have unspecified effects.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: exit-on-error (set -e) option and command after "if" not found

2011-06-22 Thread Eric Blake
On 06/22/2011 06:51 AM, Dmitry Bolshakov wrote:
> Hi
> 
> set -e
> if nosuchcommand | grep blabla
> then
>   echo ERROR: blabla was found
>   exit 1
> fi
> echo it was unexpected for me, I thougt this line should not be echoed

Not a bug; this last echo is supposed to happen.

The documentation for 'set -e' is clear:

 Exit immediately if a pipeline (which may consist  of  a
 single  simple command),  a subshell command enclosed in
 parentheses, or one of the commands executed as part  of
 a  command  list  enclosed  by braces (see SHELL GRAMMAR
 above) exits with a non-zero status.  The shell does not
 exit  if  the  command that fails is part of the command
 list immediately following a  while  or  until  keyword,
 part  of  the  test  following  the  if or elif reserved
 words, part of any command executed in a && or  ⎪⎪  list
 except  the  command  following  the final && or ⎪⎪, any
 command in a pipeline but the last, or if the  command's
 return  value  is being inverted with !.

Since the pipeline 'nosuchcommand | grep blabla' is executed as part of
the test following an 'if', then 'set -e' does not abort the shell on
failure.

'set -e' is a bear to use - it generally does not protect you from
everything that you think it ought to, and has a number of portability
bugs to boot as you migrate between versions of bash or between other
shells.  Not to mention that the POSIX folks can't even seem to get it
right; the definition of 'set -e' had to be amended even after POSIX
2008 to match historical practice, which in turn disagreed with bash
practice at the time.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: weird behavior of set -e

2011-06-24 Thread Eric Blake
On 06/24/2011 03:51 AM, Harald Dunkel wrote:
> Hi folks,
> 
> A colleague pointed me to this problem: If I run
> 
>   ( set -e; ( false; echo x ) )
> 
> in bash 4.1.5, then there is no screen output, as
> expected. If I change this to
> 
>   ( set -e; ( false; echo x ) || echo y )
> 
> then I get "x" instead of "y". How comes?

Because '(false; echo x)' is on the left hand of ||, which disables set
-e for that portion of the command line.  ksh behaves the same way, so
it is not a bash bug.

> Any helpful comment would be highly appreciated.

set -e seldom does exactly what you want - even the writers of POSIX
2008 got it wrong, and here's how they corrected it:

http://austingroupbugs.net/view.php?id=52

and that's what bash 4.1 implemented.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Yet Another test option

2011-07-06 Thread Eric Blake
On 07/06/2011 10:37 AM, Bruce Korb wrote:
> On 07/06/11 09:03, Chet Ramey wrote:
>>> /usr/bin/test ?
>>>
>>> Do this first in the binary then migrate to bash's test?
>>
>> I was actually making an argument for an entirely separate utility to do
>> this.  That could be a shell script encapsulating the proper version
>> comparison logic.
> 
> which basically means a script wrapping "sort -V" and testing whether
> the arguments got reordered or not:
> 
>   if test "X$1" = "X$3"
>   then is_eq=true ; is_lt=false
>   else
> is_eq=false
> first=$(printf '%s\n%s\n' "$1" "$2" | sort -V | head -1)
> test "X$first" = "X$1" && is_lt=true || is_lt=false
>   fi

Oh, that's rather heavyweight - a command substitution and 3 pipeline
components.  Why not just one child process, by using sort -c and a heredoc?

is_eq=false is_lt=false
if test "x$1" = "$x2"; then
  is_eq=true
elif sort -cV </dev/null; then
$1
$2
EOF
  is_lt=true
fi

> and if that proved insufficient, then "sort -V" would need an adjustment.
> I would not expect "sort -V" and a version test to disagree.

The code that coreutils uses for 'sort -V' is part of gnulib - the
filevercmp module.  That file (filevercmp.c) is pretty stable nowadays,
with the last algorithmic change being in April 2009 and no recent
complaints about unexpected behavior (whereas glibc's strverscmp is
locked into behavior, but that behavior raises complaints).  For
reference, the documentation is:

/* Compare version strings:

   This function compares strings S1 and S2:
   1) By PREFIX in the same way as strcmp.
   2) Then by VERSION (most similarly to version compare of Debian's dpkg).
  Leading zeros in version numbers are ignored.
   3) If both (PREFIX and  VERSION) are equal, strcmp function is used for
  comparison. So this function can return 0 if (and only if) strings S1
  and S2 are identical.

   It returns number >0 for S1 > S2, 0 for S1 == S2 and number <0 for S1
< S2.

   This function compares strings, in a way that if VER1 and VER2 are
version
   numbers and PREFIX and SUFFIX (SUFFIX defined as
(\.[A-Za-z~][A-Za-z0-9~]*)*)
   are strings then VER1 < VER2 implies filevercmp (PREFIX VER1 SUFFIX,
   PREFIX VER2 SUFFIX) < 0.

   This function is intended to be a replacement for strverscmp. */

However, I don't see any reason to add extensions to coreutils' test
unless we have some agreement that we plan to add the same extension to
other places like the bash builtin test at the same time.  Since we've
already demonstrated that version comparisons are a pretty trivial
wrapper around sort, I'm not seeing much justification in favor of
bloating test to make version testing builtin.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Problem with line buffered IO when no tty

2011-07-07 Thread Eric Blake
On 07/07/2011 06:55 AM, Steven W. Orr wrote:
> So, why is it that bash is behaving like it is always line buffered or
> unbuffered, even if there is no console?

Because POSIX requires 'sh' to behave like it is unbuffered:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html

When the shell is using standard input and it invokes a command that
also uses standard input, the shell shall ensure that the standard input
file pointer points directly after the command it has read when the
command begins execution. It shall not read ahead in such a manner that
any characters intended to be read by the invoked command are consumed
by the shell (whether interpreted by the shell or not) or that
characters that are not read by the invoked command are not seen by the
shell.

Bash meets this requirement by reading one byte at a time on
non-seekable input, rather than relying on the decision of libc on
whether stdin defaults to fully-buffered or line-buffered.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Built-in printf Sits Awkwardly with UDP.

2011-07-20 Thread Eric Blake

[adding coreutils]

On 07/20/2011 07:34 AM, Ralph Corderoy wrote:

BTW, the code for the built-in printf has a bug.  For negative
field-widths it negates a negative integer without checking it will fit.
E.g. on this 64-bit machine

 $ printf '%-9223372036854775808s.\n' foo
 foo.
 $


Coreutils' printf shares this misfortune.  Sadly, it might even be a bug 
in the underlying glibc printf(), although I haven't tried to write a 
test program to check that, yet.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: How to do? Possible?

2011-07-25 Thread Eric Blake

On 07/25/2011 03:45 PM, Linda Walsh wrote:

I mistyped that but it brings me to an interesting
conundrum:

GLOBAL="hi there"
{foo=GLOBAL echo ${!foo}; }


This says:

evaluate ${!foo}, and pass that expansion to 'echo', with foo=GLOBAL in 
the environment of echo.  You are invoking behavior that POSIX leaves 
undefined (that is, bash is perfectly fine evaluating ${!foo} prior to 
assigning foo, but bash would also be okay if it assigned foo prior to 
evaluating ${!foo}.  Hence, you got no output.



But:

{ foo=GLOBAL;echo ${!foo}; }

> hi there

The extra ; forces the semantics.  Here, the assignment to foo is a 
different statement than the expansion of of ${!foo}.  And while ${!foo} 
is a bash extension, it still proves that this is a case where foo was 
assigned prior to its use.



Weird...


Not if you think about it properly.

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Fallback language for internationalized string

2011-07-27 Thread Eric Blake

On 07/27/2011 09:05 AM, Andreas Schwab wrote:

Anders Sundman  writes:


Is it possible to get bash to use a fallback language for showing
localized strings if no translation exists for the current language?
Instead of using the 'raw' msgid that is.


The msgid is supossed to be the fallback.  That's how gettext works.


Not entirely.  See the glibc documentation for the LANGUAGE environment 
variable.



   While for the `LC_xxx' variables the value should consist of exactly
one specification of a locale the `LANGUAGE' variable's value can
consist of a colon separated list of locale names.  The attentive
reader will realize that this is the way we manage to implement one of
our additional demands above: we want to be able to specify an ordered
list of language.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Fallback language for internationalized string

2011-07-27 Thread Eric Blake

On 07/27/2011 09:35 AM, Anders Sundman wrote:



Andreas Schwab  wrote:


Anders Sundman  writes:


Is it possible to get bash to use a fallback language for showing
localized strings if no translation exists for the current language?
Instead of using the 'raw' msgid that is.


The msgid is supossed to be the fallback.  That's how gettext works.



There are unfortunately two problems with this. I was hoping that a specific 
language fallback would fix them.

1. If you try to use the same msgid twice in a script you get an error when 
extracting it.


Why?  It should be possible to use the same msgid twice, if you are okay 
using the same translation twice.


> Using the same human readable string twice is however a valid use 
case. So using 'wierd' (e.g. numbered) msgids make sense. But you don't 
ever want the user to see this.


If you want weird msgids that are not usable directly, then your code 
must do a comparison after the translation.  If the translation resulted 
in the msgid, then you use your sane fallback; if it resulted in a 
different string, then you got a translation.



2. If you use en strings as msgids and you later have to fix a simple spelling 
error for en, then all translation files have to be modified.


Yes, but gettext provides tools to make that modification easy.

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: The mailing list software interfered with my content

2011-08-03 Thread Eric Blake

On 08/03/2011 04:45 PM, What, me urgent? wrote:


The mailing list software interfered with my content
=
OUCH!

In my most recent post, the mailing list software replaced the string
"address@hidden" for a section of code snippet!


Not the list software, but merely the web archiver that you are viewing 
the mail in.  If you are actually subscribed to the list, rather than 
viewing a web archiver, your post came through just fine.  Furthermore, 
not all web archivers use the same mangling; while


http://lists.gnu.org/archive/html/bug-bash/2011-08/msg00012.html

has your unfortunate "address@hidden", the same message is easier to 
understand here:


http://thread.gmane.org/gmane.comp.shells.bash.bugs/16780/focus=16890

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: bug: return doesn't accept negative numbers

2011-08-05 Thread Eric Blake

On 08/05/2011 05:41 PM, Linda Walsh wrote:


I guess I don't use negative return codes that often in shell, but
I use them as exit codes reasonably often.

'return' barfs on "return -1"...

Since return is defined to take no options, and ONLY an integer,
as the return code, it shouldn't be hard to fix.


According to POSIX, it's not broken in the first place.  Portable shell 
is requires to pass an unsigned decimal integer, no greater than 255, 
for defined behavior.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#return



Seem to fail on any negative number, but 'exit status' is defined
as a short int -- not an unsigned value (i.e. -1 would return 255).


In bash, 'return -- -1' sets $? to 255 (note the --).  But since that is 
already an extension (POSIX does not require 'return' to support -- any 
more than it is required to support an argument of -1), I agree with 
your argument that bash would be more useful if, as an extension to 
POSIX, it would handle 'return -1' - in fact, that would match ksh 
behavior.  Conversely, since portable code already can't use it, it's no 
skin off my back if nothing changes here.


$ bash -c 'f() { return -- -1; }; f; echo $?'
255
$ bash -c 'f() { return  -1; }; f; echo $?'
bash: line 0: return: -1: invalid option
return: usage: return [n]
2
$ dash -c 'f() { return -- -1; }; f; echo $?'
return: 1: Illegal number: --
$ dash -c 'f() { return  -1; }; f; echo $?'
return: 1: Illegal number: -1
$ ksh -c 'f() { return -- -1; }; f; echo $?'
255
$ ksh -c 'f() { return  -1; }; f; echo $?'
255
$

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: bug: return doesn't accept negative numbers

2011-08-08 Thread Eric Blake

On 08/07/2011 02:35 PM, Linda Walsh wrote:



Eric Blake wrote:

On 08/05/2011 05:41 PM, Linda Walsh wrote:

Seem to fail on any negative number, but 'exit status' is defined
as a short int -- not an unsigned value (i.e. -1 would return 255).


In bash, 'return -- -1' sets $? to 255 (note the --). But since that
is already an extension (POSIX does not require 'return' to support --
any more than it is required to support an argument of -1), I agree
with your argument that bash would be more useful if, as an extension
to POSIX, it would handle 'return -1' - in fact, that would match ksh
behavior. Conversely, since portable code already can't use it, it's
no skin off my back if nothing changes here.

---
How about portable code using:

(exit -1); return


That's not portable, either.  exit is allowed to reject -1 as invalid. 
POSIX is clear that exit and return have the same constraints - if an 
argument is provided, it must be 0-255 to be portable.


However, you are on to something - since bash allows 'exit -1' as an 
extension, it should similarly allow 'return -1' as the same sort of 
extension.  The fact that bash accepts 'exit -1' and 'exit -- -1', but 
only 'return -- -1', is the real point that you are complaining about.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: bug: return doesn't accept negative numbers

2011-08-09 Thread Eric Blake

On 08/08/2011 08:14 PM, Chet Ramey wrote:

On 8/8/11 9:42 PM, Mike Frysinger wrote:

On Monday, August 08, 2011 21:20:29 Chet Ramey wrote:

On 8/8/11 8:53 AM, Eric Blake wrote:

However, you are on to something - since bash allows 'exit -1' as an
extension, it should similarly allow 'return -1' as the same sort of
extension.  The fact that bash accepts 'exit -1' and 'exit -- -1', but
only 'return -- -1', is the real point that you are complaining about.


That's a reasonable extension to consider for the next release of bash.


i posted a patch for this quite a while ago.  not that it's hard to code.


Sure.  It's just removing the three lines of code that were added
between bash-3.2 and bash-4.0.  The question was always whether that's
the right thing to do, and whether the result will behave as Posix
requires.


Yes, the result will behave as POSIX requires.  POSIX requires that 
'return' and 'exit' need not support '--' (since they are special 
builtins that do not specifically require compliance with the generic 
rules on option parsing), that they need not support options, and that 
if their optional argument is present, it need not be supported if it is 
not a non-negative integer no greater than 255.  But they are _not_ 
required to reject any input outside the above constraints - therefore, 
an extension that supports '--', an extension that parses '-- -1' as 
255, and an extension that parses any option that looks like a negative 
number such as 'exit -1', are ALL valid extensions permitted by POSIX, 
and need not be disabled by --posix, but can be available always.  ksh 
does just that: 'return -1' and 'return -- -1' are always accepted and 
both result in the same behavior as the POSIX-mandated 'return 255'; ksh 
also has an extension where 'return --help' prints help, although bash 
uses 'help return' for this purpose.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Is bash dying or dead (was Re: 4.1 is "$((( ))) an 'official operator, if $(( )) isn't?

2011-08-10 Thread Eric Blake
ng was tightened enough to actually be implementable by bash in a 
manner that matched ksh.




Bash is becoming very unstable -- programs that work in 3.1 won't
necessarily work in 3.2, those in 3.2 aren't compat with 4.0, 4.0 is
different than 4.1, and now 4.2 is different than 4.1.


That's because older bash has had bugs where it doesn't comply with 
POSIX, and those bugs have been fixed, but sometimes the fixes have 
consequences on the bash extensions.  But if you use the POSIX subset, 
rather than the bash extensions, you should notice that newer bash is 
better, not worse, than older bash when it comes to running portable 
scripts.



How can people write stable scripts in an enironment of constant change?


By sticking to the common denominator that is known to work.


Please people, am I being 'over-reactive'? Or are these valid
concerns?


At least in my view, you are coming across as over-reactive and ranting, 
even if that was not your intent.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: saving bash.....

2011-08-11 Thread Eric Blake

On 08/10/2011 10:39 PM, Linda Walsh wrote:

Chet Ramey wrote:

> If not, then wouldn't
> $((( ))) be turned into $( (( )) ), meaning the arith returns a
> status,
> and not the calculation. (I've tested this, and this is the case.

Then I said:

"It sounded to me like $(( )) would be translated into "$( () )",
turning off arithmetic expansion. Did I read that ___incorrectly__?


Yes, you read it incorrectly.

POSIX is saying that _if_ you want to guarantee command substitution of 
a subshell (that is, $() with a leading ( as the first item of the 
command to run), then use spaces.  But if you want to guarantee 
arithmetic expansion, then you _have_ to start with $((; and that if you 
start with $((, then arithmetic expansion takes precedence.




If not, [i.e. if I read it correctly] then wouldn't $((( ))) be turned
into $( (( )) ), meaning the arith returns a status, and not
the calculation. (I've tested this, and this is the case)."


Remember, $((body)) is treated as arithmetic expansion if body looks 
like arithmetic, and is only turned into $( (body)) otherwise (here, 
your body is '()').  POSIX' recommendation to add space is for the case 
of turning ambiguous $((body)) into definite command substitution $( 
(body)); you do not add space after the first '(' when you want arithmetic.




;-) ? (no insult intended! honest...just asking you (and others) to take
a deep breath and not be reactive to what they think I'm am saying but
try to focus on what I'm really "trying" (without great success,
but I keep trying!) to say...


Same here - I think we're all trying to be civil and helpful, and while 
emotions sometimes rise to the surface, hopefully everyone around is 
recognizing the underlying fact that no one here is intentionally trying 
to be malicious.




I don't find it useful to have (()) cause exceptions. It's not a useful
construct -- having it do so means it has to be programmed around.


(()) is a useful construct, when used correctly; best designed for the 
condition of a loop expression where the loop will terminate when the 
count finally evaluates to 0.  It does the same parsing and operation as 
$(()), except that instead of resulting in numeric output, it results in 
a change to $?.  And once you realize that (()) is the shorthand 
counterpart to 'expr', and therefore has the same effect on $? (where 0 
gives $?=1, non-zero gives $?=0, and error gives $?>1), then you can 
properly use this extension.  Or, since it is an extension, you can just 
avoid it altogether, and write scripts more portably.



It's not useful to have a function that is meant to return no value, --
if last calc'ed value was 0, to cause a script to failit may be
POSIX, but I'm looking for bash that was useful to write script in and
do some rudimentary programming with -- NOT a POSIX shell, I can get
from Microsoft or any other vendor.


If you want to use 'set -e', then you have to deal with arithmetic 
results of 0 setting $? to 1, whether the arithmetic results come from 
traditional 'expr' or extension '(())'.  That's life with 'set -e'; my 
advice has always been to avoid 'set -e' as a crutch and instead do 
error checking yourself, since 'set -e' behavior is unintuitive by design.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: set -e yet again (Re: saving bash....)

2011-08-12 Thread Eric Blake

On 08/12/2011 06:51 AM, Greg Wooledge wrote:

On Thu, Aug 11, 2011 at 11:56:10PM -0700, Linda Walsh wrote:

**Exception**
declare -i a
a=0
--
As a is declared to be an integer, it has the results evaluated
at assignment time.   a=0 is an integer expression that doesn't set
$?=1
Neither should:
((a=0))


a=0 is an assignment.  Assignments always return 0.


No they don't.

readonly a
a=0

sets $? to non-zero.



imadev:~$ ((a=0)); echo $?
1

And here, the same thing, but we return false, because the value was 0.
This is the thing about which you are complaining.  This is also one
of the things I mention on http://mywiki.wooledge.org/BashFAQ/105 in
which I describe how horrible and useless set -e is.


And that point has been made several times in this thread.  'set -e' is 
a historical wart - bash has it because POSIX requires it.  If you want 
to use bash extensions, then _don't_ use 'set -e', and you don't have to 
worry about how the unintuitive behavior interacts with extensions.


Greg, you missed one other useful form:

a=$((0))

This is an assignment (sets $? to 0 for all but errors like assigning to 
a readonly variable) of arithmetic expansion.  Also POSIX, and slightly 
shorter than


: $((a=0))

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: conditional aliases are broken

2011-08-15 Thread Eric Blake

On 08/15/2011 01:10 PM, Sam Steingold wrote:

* Andreas Schwab  [2011-08-15 18:42:30 +0200]:

Sam Steingold  writes:


this works:

$ alias z='echo a'
$ zz(){ z b; }
$ zz
a b

however, after sourcing this file:
if true; then
   alias z='echo a'
   zz(){ z b; }
fi


Aliases are expanded during reading, but the alias command isn't
executed until after the complete compound command was read.


Cool.  Now, what does this imply?
Is this the expected behavior aka "feature"?


Yep - feature.  All shells behave that way.  They parse to an end of a 
command (in your case, the end of the compound 'if-fi' command), then 
process statements within the command.  Alias expansion affects parsing, 
so your alias cannot take effect until after the compound command has 
been parsed, and all attempts to use the alias from within the compound 
command were parsed with the pre-command expansion (ie. no alias).


Yet another reasons why aliases are mostly replaced by functions.

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: conditional aliases are broken

2011-08-15 Thread Eric Blake

On 08/15/2011 04:40 PM, Sam Steingold wrote:

* Andreas Schwab  [2011-08-15 22:04:04 +0200]:

Sam Steingold  writes:


Cool.  Now, what does this imply?


"For almost every purpose, shell functions are preferred over aliases."


so, how do I write

alias a=b

as a function?
(remember that arguments may contain spaces&c)


a() { b "$@"; }

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: conditional aliases are broken

2011-08-18 Thread Eric Blake

On 08/18/2011 08:38 AM, Sam Steingold wrote:

mkdir z
cd z
touch a b 'c d'


When doing exercises like this, I like to:

touch a b 'c  d'

Notice the double spacing - it proves whether I used enough quoting 
throughout the exercise - if 'c d' with one space shows up anywhere, 
then I missed quoting, because word splitting followed by argument 
concatenation with only one space must have happened.




how do I write a function that would print the same as
$ \ls | cat
a
b
c d



$ f1(){ for a in "$*"; do echo $a; done; }


Incorrect quoting on $a.  Also, remember the difference between $* and 
$@ inside "" - the former creates only one word, and only the latter 
splits the result into the same number of words as were originally 
arguments to the function.  You meant:


f(){ for a; do echo "$a"; done; }

or

f(){ for a in "$@"; do echo "$a"; done; }

(both are identical).

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: conditional aliases are broken

2011-08-18 Thread Eric Blake

On 08/18/2011 08:44 AM, Eric Blake wrote:

how do I write a function that would print the same as
$ \ls | cat


Useless use of cat.  This can be done with \ls -1.


f(){ for a in "$@"; do echo "$a"; done; }


Or skip the loop altogether:

f(){ printf %s\\n "%@"; }

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Is this a bug in [[ -f ]]?

2011-08-18 Thread Eric Blake

On 08/18/2011 10:35 AM, Steven W. Orr wrote:

I have a symlink file and if if I test it I get success status with -f.

831 > ls -l errio err
lrwxrwxrwx. 1 sorr fc 5 Aug 18 08:48 err -> errio
-rw-rw-r--. 1 sorr fc 3816 Aug 18 08:48 errio
832 > [[ -f errio ]]
833 > echo $? # Good answer
0
*834 > [[ -h errio ]]
835 > echo $? # Good answer
1
*836 > [[ -f err ]]
837 > echo $? # BAD answer
0


Good answer.  The man page says that symlinks are dereferenced for most 
tests (-h and -L being the exceptions).  Since err is a symlink that 
resolves, -f err is true.




Is this a bug? Is there a workaround? Am I doing something wrong?


Merely that you were expecting -f to mean 'file and not symlink' rather 
than what was documented of 'file or symlink to file'.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: conditional aliases are broken

2011-08-18 Thread Eric Blake

On 08/18/2011 11:38 AM, Stefano Lattarini wrote:

Hi Eric.

On Thursday 18 August 2011, Eric Blake wrote:

On 08/18/2011 08:44 AM, Eric Blake wrote:

how do I write a function that would print the same as
$ \ls | cat


Useless use of cat.  This can be done with \ls -1.


f(){ for a in "$@"; do echo "$a"; done; }


Actually, echo "$a" is not portable - if you have any file names 
beginning with - or containing \, then the results can be corrupted.




Or skip the loop altogether:

f(){ printf %s\\n "%@"; }


I think you've made a typo here; it should have been:

  f () { printf %s\\n "$@"; }


Yep, slip of one key when I typed (at least on my keyboard, % and $ are 
neighbors).




I guess that's what you meant, right?

BTW, is this behaviour truly portable to other shells and/or printf
utilities?  POSIX seems to require it to portable, but you never
know ...


It's portable, but not always fast (some shells lack printf(1) as a 
builtin, and end up spawning a process).  And in the case of arbitrary 
file names, printf is always better than echo, since it handles \ and 
leading - correctly.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: conditional aliases are broken

2011-08-18 Thread Eric Blake

On 08/18/2011 08:53 AM, Roman Rakus wrote:

On 08/18/2011 04:38 PM, Sam Steingold wrote:

how do I write a function that would print the same as
$ \ls | cat

f3(){ printf "%s\n" "$@"; }


"\n" looks funny in shell; even though POSIX requires that "\n" does not 
treat the \ as an escape but as a literal character, stylistically, I 
prefer writing "\\n" or '\n' to make it clear that I intended a literal 
backslash.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Is this a bug in [[ -f ]]?

2011-08-19 Thread Eric Blake

On 08/19/2011 08:45 AM, Suvayu Ali wrote:

I am trying to test if a file exists and then source it. My problem is
the test succeeds even if the variable is empty! If I pass no argument
at all, it still succeeds. To give you an example:

$ unset bla
$ [ -f $bla ]&&  echo yes
yes
$ [ -f  ]&&  echo yes
yes


Both expected behaviors, and evidence of your lack of quoting.

Remember, the behavior of [] depends on how many arguments are present.

[ -f "$bla" ] (note the "") - guarantees that there are exactly two 
arguments, so it proceeds with the two-argument test where -f is the 
operator and "$bla" is the file name.


[ -f ] (which is the same as [ -f $bla ] if $bla is empty, note the lack 
of "") - there is exactly one argument, so it proceeds with the 
one-argument test of whether the argument (the literal string -f) is 
empty (it is not).


Furthermore, [ -f $bla ] is different than [[ -f $bla ]].  [ is a POSIX 
utility, and mandated to do all argument word expansion before [ ever 
gets a chance to see what arguments it was given - if $bla is empty or 
has spaces, you changed the number of arguments that are given to [.  [[ 
is a bash (and ksh) extension that is part of the shell syntax (similar 
to how () for subshells is part of the syntax), thus it knows how many 
words, _pre-expansion_, were present, and the fact that $bla was 
unquoted is not a problem, [[ -f $bla ]] is a safe way to check if $bla 
is a file even if $bla is empty or contains spaces.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: variables set on command line

2011-08-24 Thread Eric Blake

On 08/24/2011 09:24 AM, Sam Steingold wrote:

CYGWIN_NT-5.2-WOW64 sds 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin
BASH_VERSION='4.1.10(4)-release'

at the bash prompt I observe this:
$ f(){ echo a=$a b=$b c=$c ; }
$ unset a b c
$ a=a b=b f
a=a b=b c=
$ f
a= b= c=
which I believe is correct (i.e., variables set in "a=a b=b f" are unset
after f terminates).


This is bash's default behavior, but it violates POSIX.



alas, when I call /bin/sh on the same machine, I see this:


That tells bash to strictly obey POSIX, so you get the POSIX behavior.



f(){ echo a=$a b=$b c=$c ; }
f
a= b= c=
a=a b=b f
a=a b=b c=
f
a=a b=b c=


Which is indeed correct under the rules for POSIX (basically, POSIX 
requires function calls to behave like special built-ins, such that 
changes to the environment persist after the function call - the bash 
developer thinks the posix rule is counterintuitive, which is why the 
default bash behavior is opposite the posix behavior).


Your question is not cygwin-specific.


is this the expected behavior?


Yes.

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: variables set on command line

2011-08-24 Thread Eric Blake

On 08/24/2011 10:07 AM, Sam Steingold wrote:

* Eric Blake  [2011-08-24 09:31:45 -0600]:

f(){ echo a=$a b=$b c=$c ; }
f
a= b= c=
a=a b=b f
a=a b=b c=
f
a=a b=b c=


Which is indeed correct under the rules for POSIX


This sucks big time.


Such is life when dealing with shell portability.


So if I want to bind a variable for an eval invocation and do this:

   eval "`./libtool --tag=CC --config | grep '^archive_cmds='`"
   CC='${CC}' libobjs='$libs' deplibs='${CLFLAGS}' compiler_flags='${CFLAGS}' \
 soname='$dll' lib='$lib' output_objdir='$dyndir' \
 eval XCC_CREATESHARED=\"${archive_cmds}\"

and I want CC to have an old value after the second eval, I need to save
it and restore it by hand, like this:

   CC_save=$CC
   CC='${CC}' libobjs='$libs' deplibs='${CLFLAGS}' compiler_flags='${CFLAGS}' \
 soname='$dll' lib='$lib' output_objdir='$dyndir' \
 eval XCC_CREATESHARED=\"${archive_cmds}\"
   CC=$CC_save

however, this does not distinguish between unset CC and CC=''.
(is there a way to distinguish these two situations?)


Yes - autoconf does this all the time, using an idiom roughly like this:

CC_set=${CC+set}
CC_save=$CC
do stuff that modifies $CC
if test "$CC_set" = set; then
  CC=$CC_save
else
  unset CC
fi

Also, you can use command to suppress the ability of built-ins like eval 
(but not function calls) to affect the current environment:


$ unset foo
$ foo=bar eval :
$ echo $foo
bar
$ unset foo
$ foo=bar command eval :
$ echo $foo
$

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Why bash command "cd //" set path "//" ?

2011-08-26 Thread Eric Blake

On 08/26/2011 04:38 AM, Andrey Demykin wrote:

Why bash command "cd //" set path "//" ?


Because POSIX says that implementations may (but not must) treat // 
specially.  And rather than special case just the implementations that 
do treat it specially (such as cygwin), bash globally respects // on all 
platforms even where it is not special.




I found this in all version of the bash.
Excuse me , if it is not a bug.


Not a bug.

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Bash does not like DOS file format

2011-10-10 Thread Eric Blake
On 10/05/2011 02:37 PM, Russ Browne wrote:
> Configuration Information [Automatically generated, do not change]:
> 
> Machine: i686
> 
> OS: cygwin
> 
> Compiler: gcc-4
> 
> Compilation CFLAGS:  -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='i686'
> -DCONF_OSTYPE='cygwin'

Given that you are trying to use cygwin, this may be a question better
asked on the cygwin lists.

> I thought this must surely be a simple goof that would be fixed
> in the
> 
> next release, but when I reloaded cygwin for my new PC late last
> year
> 
> the bug was still there.

This is not a bug, but a conscious design decision.  In particular, the
cygwin port of bash intentionally behaves like Unix in treating CR as
literal characters (for speed and POSIX compliance reasons) unless you
take explicit measures to request that it ignore CR (your explicit
request is what allows bash to ignore POSIX).  Read the cygwin bash
release notes for more details on the measures you can take:

http://sourceware.org/ml/cygwin-announce/2011-02/msg00027.html

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Error in manual for >&word redirection

2011-10-12 Thread Eric Blake

On 10/12/2011 02:07 PM, Greg Wooledge wrote:

Even using a space is not sufficient to force a valid file descriptor number
to be treated as a filename:

imadev:~$ foo>&  1
stdout
stderr
imadev:~$ ls -l 1
1 not found


If you want 'word' treated as a filename, then express it as a filename. 
 It's still possible to write to a file named '1':


foo >&./1

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: New flag option request

2011-10-20 Thread Eric Blake

On 10/20/2011 08:48 AM, Bruce Korb wrote:

You may have this in the queue already, but just in case:

POSIX now specifies that if a standard utility has extended options,
then you accomplish it with ``-W option-name[=opt-arg]''.


Not quite.  POSIX specifies only that -W is reserved for 
implementation-defined extensions.  glibc's getopt_long _happens_ to 
have the implementation-defined extension that '-W foo' is equivalent to 
'--foo', so it would make sense that bash support the same extension as 
glibc for consistency among GNU programs, but that is _not_ a POSIX 
requirement.



I wouldn't care, but I wanted to add ``--noprofile --norc''
to the command line and, for debugging purposes, I aliased "bash"
to "bash -x". Oops. Two issues:

1. I'd be nice to be able to interleave short and long options, and
2. "bash -x -W noprofile -W norc" should also work.


Bash currently doesn't use getopt_long for option parsing, but rolls its 
own parser.  A patch to make the bash parser support mixed long and 
short options would also be welcome in my mind.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: What is the correct way to set up login environment in crontab?

2011-11-09 Thread Eric Blake

On 11/09/2011 10:14 AM, Peng Yu wrote:

variable assignment VAR=blah.


That sets up a bash-local variable.  If you want it to be exported to 
the environment visible to child processes, then you _also_ need to use 
export, as in either:


VAR=blah
export VAR

or

export VAR=blah



However, VAR is not seen in env in /path/programtorun.sh (called from cron).


Right - bash maintains two sets of variables, and only the exported ones 
are visible to children.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: Customize the command resolution in bash?

2011-11-11 Thread Eric Blake
On 11/11/2011 03:23 PM, Peng Yu wrote:
> Hi,
> 
> bash by default searchs in paths specified in the environment variable
> PATH (separated by ":"). I'm not aware if there is any cache mechanism
> to save the run time (but even so, different terminals still can not
> see the same cache, hence each terminal has the overhead to create the
> cache). When there are many files in PATH, it is going to slow down
> the performance.
> 
> One simple remedy is to instead search in a file where the abspaths of
> all the commands are saved (of course, this database file can be
> generated by using the command 'find' to search for all the
> directories in $PATH, which process can be scheduled to run
> periodically using cron). To make this work, I'm wondering if there is
> an easy way to customize the way that bash resolve a command.

Bash isn't doing the resolution so much as libc (read 'man execvp')
(well, technically, bash may be manually repeating some of the same
resolution code as in execvp for other reasons, but the same principles
apply).

If you want to search fewer directories, then nothing is stopping you from:

mkdir ~/mycache
for every executable you want cached:
  ln -s executable ~/mycache/
PATH=$HOME/mycache:$PATH

so that you now have a single directory with symlinks to all the
executables you want to run; thus, the attempt to stat()/execvp() each
executable will hit in the first directory in $PATH rather than getting
lots of misses as you progress through each directory in $PATH.  But
you'll still have to crawl through every directory (including the new
one you just added) for resolving things such as 'nosuch' not existing
anywhere on $PATH.

But whether this provides a measurable speedup, I don't know.  Benchmark
it yourself if you are interested in trying it.

Meanwhile, per POSIX, bash DOES provide hashing once it learns where an
executable lives, so that future invocations can rely on the hash (the
hash is invalidated when you assign to $PATH).  Read up on 'help hash'.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: invoke tilde expansion on quoted string

2011-11-12 Thread Eric Blake
On 11/12/2011 07:53 AM, Geir Hauge wrote:
> 2011/11/12 Chris F.A. Johnson 
> 
>> On Fri, 11 Nov 2011, Peng Yu wrote:
>>
>>> I'm wondering if I already have a string variable, is there a bash
>>> native to do tilde expansion on it.
>>>
>>> var='~/..'
>>> cd $var#how to change this line?
>>>
>>
>>  eval "cd $var"
>>
> 
> I'd avoid eval as that could potentially do more than just expand the
> tilde, depending on what other characters the var contains. I'd just
> replace the ~ with $HOME using parameter expansion.
> 
> cd "${var/#~\//$HOME/}"

Except that your proposed parameter expansion only works for plain ~.
It doesn't cover more useful tilde expansions, such as ~user/, which
does NOT expand to $HOME, but to "user"'s home directory.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: set -e works incorrectly in subshells

2011-11-23 Thread Eric Blake
On 11/23/2011 03:26 AM, Марк Коренберг wrote:
> Repeat-By:
>   mmarkk@mmarkk-work:~$ ( set -e; echo aaa; false; echo bbb )
>   aaa
>   mmarkk@mmarkk-work:~$ ( set -e; echo aaa; false; echo bbb ) || true
>   aaa
>   bbb
>   mmarkk@mmarkk-work:~$

ksh has the same behavior, and POSIX requires it (basically, running the
subshell on the left of || has a higher precedence than the explicit
'set -e' within the subshell).

http://austingroupbugs.net/view.php?id=52

Expected behavior.  And one of the arguments I give why using the crutch
of 'set -e' is almost always the wrong thing in a complex script.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   5   6   7   >