Re: bash tries to parse comsub in quoted PE pattern

Greg Wooledge Wed, 18 Oct 2023 06:45:04 -0700

On Wed, Oct 18, 2023 at 08:19:35AM -0400, Zachary Santer wrote:
> On Tue, Oct 17, 2023 at 5:56 PM Emanuele Torre <torreemanue...@gmail.com>
> wrote:
> 
> >     bash-5.1$ letters=( {a..z} ); echo "${letters["{10..15}"]}"
> >     k l m n o p
> >
> 
> Then there's the question "Was that even supposed to work like that?"


This is another one of those cases where some end user did something
wacky and it produced a result that they liked, so they kept doing it.
At some point, they convinced themselves that this was an intended
feature, and they'd probably be upset if it stopped "working".

It's a pretty straightforward application of brace expansion, but not
one I would have thought to use.

> If
> so, you'd think it would generalize to being able to pass a series of
> whitespace-delimited indices to an array expansion.

That's not what's happening, at all.

unicorn:~$ set -x
unicorn:~$ echo "a b"{c,d}"e f"
+ echo 'a bce f' 'a bde f'
a bce f a bde f

Brace expansion is extremely low-level text macro expansion.  In my
example above, the thing after the echo consists of a single parser-word
which contains two quoted sections, and a brace expansion.  The brace
expansion fires first, and causes two words to be generated: "a b"c"e f"
is the first, and "a b"d"e f" is the second.

After those words are generated, quote removal occurs, and the result
is what you see in the -x output.

> In Bash 5.2:
> $ array=( zero one two three four five six )
> $ printf '%s\n' "${array["{2..6..2}"]}"
> two
> four
> six

This one "works" because the final parser-word is a brace expansion with
quoted sections before and after it.  The brace expansion causes three
words to be generated: "${array["2"]}" is the first, and so on.

You're basically typing

    printf '%s\n' "${array["2"]}" "${array["4"]}" "${array["6"]}"

or rather, you're letting the brace expansion type it for you.

It just so happens that "${array["2"]}" is accepted as an indexed array
element expansion, despite the extra quotes inside the square brackets.

> $ printf '%s\n' "${array[{2..6..2}]}"
> -bash: {2..6..2}: syntax error: operand expected (error token is
> "{2..6..2}")

Here, what would have been a valid brace expansion is inside quotes, so
it's not expanded.  "{2..6..2}" is not a valid indexed array index value.

> $ printf '%s\n' "${array["2 4 6"]}"
> -bash: 2 4 6: syntax error in expression (error token is "4 6")

"2 4 6" is not a valid indexed array index value either.

> $ printf '%s\n' "${array[2 4 6]}"
> -bash: 2 4 6: syntax error in expression (error token is "4 6")

Same here, just without the extra quotes.

> $ printf '%s\n' "${array[2,4,6]}"
> six

In this case, "2,4,6" *is* a valid indexed array index.  It's an
arithmetic expression, whose value is that of the thing after the
last comma.

A comma series in an arithmetic expression is intended to be used in
cases like:

unicorn:~$ let 'a=1,b=a+1'; declare -p a b
declare -- a="1"
declare -- b="2"

The evaluation of the comma series as the last element is mostly an
afterthought.  "What else would it evaluate to?"  We're more interested
in the side effects, rather than the final value.

This comes from C, where the most common use is something like:

    for (i=1,j=2; i<10; i++,j+=2) {...}

The comma series lets you perform two assignments/alterations in a
place where the syntax only asks for one.

> $ indices=(2 4 6)
> $ printf '%s\n' "${array[${indices[@]}]}"
> -bash: 2 4 6: syntax error in expression (error token is "4 6")
> $ printf '%s\n' "${array[${indices[*]}]}"
> -bash: 2 4 6: syntax error in expression (error token is "4 6")

In these cases, the inner expression is evaluated, yielding an
invalid indexed array index value.  "2 4 6" isn't allowed as an index,
no matter where you pull it from.

> Considering I don't think this is documented anywhere, and what's in
> between the square brackets gets an arithmetic expansion applied to it, I'm
> going to guess "no."
> 
> So how important is it to maintain undocumented behavior?

There is one analogous case where the behavior *did* change.  Consider
a command like this:

unicorn:~$ echo <(printf '%s\n' {a..c})
/dev/fd/63

In bash 5.2, we get the result shown above.  The brace expansion happens
inside the process substitution.  It's as if we had typed

    echo <(printf '%s\n' a b c)

In bash 3.2, however:

unicorn:~$ bash-3.2
unicorn:~$ echo <(printf '%s\n' {a..c})
/dev/fd/63 /dev/fd/62 /dev/fd/61

Here, the brace expansion happened *first*, and we got three separate
process substitution words out of it.  It's as if we had typed

    echo <(printf '%s\n' a) <(printf '%s\n' b) <(printf '%s\n' c)

Changing this was significant, and caused some scripts to break, and
some people to become confused.  But it caused *more* people to be less
confused, because the current behavior looks a lot more reasonable to
more people vs. the previous behavior.

So, there's precedent for changing something that seems wrong, when it
comes to brace expansions.  The first question is whether the current
behavior is considered wrong.  The second is whether it's wrong *enough*
to justify a change.

I personally don't see a need to change anything in this case.  If users
want to abuse brace expansion to generate "${a[x]}" words for them,
and save some typing, that's their call.  I won't be using it my scripts,
but that's *my* call.  I don't, at the moment, see any cases where the
current behavior causes a surprising breakage, or significant confusion.

Re: bash tries to parse comsub in quoted PE pattern

Reply via email to