printf '\uFEFF' outputs invalid UTF-8 on Windows

2018-11-05 Thread Kalle Olavi Niemitalo
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: msys
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='x86_64' 
-DCONF_OSTYPE='msys' -DCONF_MACHTYPE='x86_64-pc-msys' -DCONF_VENDOR='pc' 
-DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H 
-DRECYCLES_PIDS   -I.  -I. -I./include -I./lib  -DWORDEXP_OPTION 
-Wno-discarded-qualifiers -march=x86-64 -mtune=generic -O2 -pipe 
-Wno-parentheses -Wno-format-security -D_STATIC_BUILD -g
uname output: MINGW64_NT-6.1 fjkallen 2.10.0(0.325/5/3) 2018-07-25 13:06 
x86_64 Msys
Machine Type: x86_64-pc-msys

Bash Version: 4.4
Patch Level: 19
Release Status: release

Description:
The builtin printf '\uFEFF' outputs ED 9F BF ED BB BF in a
UTF-8 locale on Microsoft Windows, where sizeof(wchar_t) == 2.
It should output EF BB BF, like printf (GNU coreutils) 8.30
does.

The incorrect output ED 9F BF ED BB BF is a UTF-8-like encoding
of U+D7FF U+DEFF, which looks somewhat like a UTF-16 surrogate
pair but the U+D7FF character is not in the surrogate range.

Repeat-By:
Install Git for Windows 2.19.1, on Windows 7 SP1.
Start "Git Bash" from the Start menu.
Run the command:
  env --ignore-environment LANG=en_US.UTF-8 \
  /usr/bin/bash --noprofile -c 'builtin printf "\ufeff"' \
  | od -t x1

Fix:
In lib/sh/unicode.c, change u32toutf16 to treat characters in the
U+E000...U+ range just like the U+...U+D7FF range, i.e.
copy them unchanged to the output and not make a surrogate pair.
I did not test that change but the function clearly has a bug and
it matches the symptoms perfectly.



Re: printf '\uFEFF' outputs invalid UTF-8 on Windows

2018-11-05 Thread Chet Ramey
On 11/5/18 12:09 PM, Kalle Olavi Niemitalo wrote:

> Bash Version: 4.4
> Patch Level: 19
> Release Status: release
> 
> Description:
> The builtin printf '\uFEFF' outputs ED 9F BF ED BB BF in a
> UTF-8 locale on Microsoft Windows, where sizeof(wchar_t) == 2.
> It should output EF BB BF, like printf (GNU coreutils) 8.30
> does.

Thanks for the report. This has been fixed for almost exactly two years
in the devel branch, the result of

http://lists.gnu.org/archive/html/bug-bash/2016-11/msg00039.html

and is fixed in the bash-5.x alpha and beta versions.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Indices of array variables are sometimes considered unset (or just display an error).

2018-11-05 Thread Great Big Dot
uname output: Linux ArchBox0 4.18.16-arch1-1-ARCH #1 SMP PREEMPT Sat Oct 20 
22:06:45 UTC 2018 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu
Bash Version: 4.4
Patch Level: 23
Release Status: release
--text follows this line--
Description:
The parameter expansion "${!var[@]}" expands to the indices of an array
(whether linear or associative). The expansion "${var-string}"
returns "${var}" iff var is set and 'string' otherwise. These two
features do not play well together:

$ declare -a -- array=([0]=hello [1]=world)
$ printf -- '%s\n\n' "${!array[@]-Warning: unset}"
bash: hello world: bad substitution

$ declare -a -- array=([0]='helloworld')
$ printf -- '%s\n\n' "${!array[@]-Warning: unset}"
Warning: unset

$ declare -a -- array=([0]='hello world')
$ printf -- '%s\n\n' "${!array[@]-Warning: unset}"
bash: hello world: bad substitution

$ declare -a -- array=()
$ printf -- '%s\n\n' "${!array[@]-Warning: unset}"
Warning: unset

As you can see, accessing the index list of multiple-element arrays
fails when you append the unset expansion. With single-element
arrays, it fails iff the element in question contains any special
characters or whitespace, and thinks the array is unset otherwise.
(Further testing shows that a value of the empty string also throws
an error.) Finally, empty arrays are also considered unset. (This is
the one thing that is consistent with the rest of bash, since empty
arrays themselves are also considered unset by this expansion; that
is, "${array[@]-unset}" yields 'unset' when array isn't set.)

This pattern of behavior is apparently unaffected by changes to IFS,
using a normal variable as a one-element array, using an unset
variable as a zero-element array, or using an associative instead of
linear array. That last one has an interesting wrinkle, however:

   $ declare -A -- assoc=(['k e y']='element')
   $ printf -- '%s\n\n' "${!assoc[@]-Warning: unset}"
   Warning: unset

   $ declare -A -- assoc=(['key']='e l e m e n t')
   $ printf -- '%s\n\n' "${!assoc[@]-Warning: unset}"
   bash: e l e m e n t: bad substitution

Strangely, whether a single-element array errors (as opposed to
giving the wrong result) is only dependent on the the characters in
the *element*, not the *key*---despite the fact that only the key's
value is being requested!


Repeat-By:
$ declare -a arr_2_=(zero one);  printf '%s\n' "${!arr_2_[@]-unset}"
bash: zero one: bad substitution
$ declare -a arr_1a=('z e r o'); printf '%s\n' "${!arr_1a[@]-unset}"
bash: z e r o: bad substitution
$ declare -a arr_1b=('zero');printf '%s\n; "${!arr_1b[@]-unset}"
unset

Fix:
To avoid this problem, you just need to spend another line or two
writing out the relevant conditional explicitly; for example:

#  ... "${!array[@]-}"
if [ -v 'array[@]' ]; then
 ... "${!array[@]}" ...
else
 ...  ...
fi

Note that `test -v 'array[@]'` has the same "feature" that
"${array[@]-default}" does: it treats empty arrays as unset.



Re: Indices of array variables are sometimes considered unset (or just display an error).

2018-11-05 Thread Great Big Dot
On Mon, Nov 5, 2018 at 4:56 PM Great Big Dot  wrote:
> The parameter expansion "${!var[@]}" expands to the indices of an array
(whether linear or associative).

Hold up... when I view this email on the public archives, all of my
"${array[@]}"'s (that is, "${array[]}"'a) got turned to
"address@hidden"'s. Was I supposed to use some escape sequence or
something? Is everyone who's subscribed to the mailing list able to see the
actual text? Or should I resend this bug report with all \@-signs escaped
somehow?

Testing...
test...@example.com
testing@example.com
testing﹫example.com
testing\@example.com
testing @ example.com


Re: Indices of array variables are sometimes considered unset (or just display an error).

2018-11-05 Thread Great Big Dot
> On Mon, Nov 5, 2018 at 4:56 PM Great Big Dot 
wrote:
> [... A]ccessing the index list of multiple-element arrays
> fails when you append the unset expansion. With single-element
> arrays, it fails iff the element in question contains any special
> characters or whitespace, and thinks the array is unset otherwise.
> (Further testing shows that a value of the empty string also throws
> an error.) Finally, empty arrays are also considered unset[...]

Oops, just realized what's causing this. I guess it isn't necessarily a
bug? Debatable, I guess.

What's actually happening here is that the *indirection* expansion
"${!foo}", and not the *indices* expansion "${!foo[@]}", is what is being
preformed on something like "${!array[@]-}". Both expansions, while
unrelated, happen to use the same syntax, with the exception that
indirections apply to normal variables and index expansions apply to array
variables. For some reason, adding on the "${foo-default}" expansion causes
the former to be used instead of the latter. This can be seen here:

$ array=(foo)
$ printf -- '%s\n' "${!foo[@]-unset}"
unset
$ foo='hello world'
$ printf -- '%s\n' "${!foo[@]-unset}"
hello world

So first the array is expanded, and then it's treated as a redirection, and
then the unset part kicks in if the array's value isn't an extant variable
name. This explains all the observations I made.

I still think it makes more sense if the "!" in "${!array[@]}" triggered
index expansion instead. At the very least, surely it should be one of
those expansion combinations that just isn't allowed, like
"${#foo[@]-default}" (actually, why is that disallowed?). Anyways, I don't
really see the point of the current behavior.

> This pattern of behavior is apparently unaffected by changes to IFS[...]

Upon further examination, and in light of the above realization, this
actually isn't true. In particular, iff the first character of IFS is
alphanumeric or an underscore (or if IFS is the empty string), and if you
use the "${array[*]}" form instead, then the expansion doesn't throw an
error when the array contains more than one element. E.g.:

$ array=(foo bar)
$ printf -- '%s\n' "${!array[*]-Warning: unset}"
bash: foo bar: bad substitution
$ IFS='_'
$ printf -- '%s\n' "${!array[*]-Warning: unset}"
Warning: unset
$ foo_bar='Beto2018'
$ printf -- '%s\n' "${!array[*]-Warning: unset}"
Beto2018
$ IFS=''
$ printf -- '%s\n' "${!array[*]-Warning: unset}"
Warning: unset
$ foobar='Hello, world'
$ printf -- '%s\n' "${!array[*]-Warning: unset}"
Hello, world

Though I understand it now, the above behavior doesn't seem especially
motivated to me. I mean, the variables that end up getting expanded don't
actually have their names stored anywhere, yet the indirection points to
them.

Is there a good reason for treating "${!array[@]-}" and "${!array[*]-}"
like indirections instead of index expansions (or just throwing an error)?


Re: Indices of array variables are sometimes considered unset (or just display an error).

2018-11-05 Thread Eduardo Bustamante
On Mon, Nov 5, 2018 at 6:01 PM Great Big Dot  wrote:
(...)
> > [... A]ccessing the index list of multiple-element arrays
> > fails when you append the unset expansion. With single-element
> > arrays, it fails iff the element in question contains any special
> > characters or whitespace, and thinks the array is unset otherwise.
> > (Further testing shows that a value of the empty string also throws
> > an error.) Finally, empty arrays are also considered unset[...]
>
> Oops, just realized what's causing this. I guess it isn't necessarily a
> bug? Debatable, I guess.
>
> What's actually happening here is that the *indirection* expansion
> "${!foo}", and not the *indices* expansion "${!foo[@]}", is what is being
> preformed on something like "${!array[@]-}". Both expansions, while
> unrelated, happen to use the same syntax, with the exception that
> indirections apply to normal variables and index expansions apply to array
> variables. For some reason, adding on the "${foo-default}" expansion causes
> the former to be used instead of the latter. This can be seen here:

Sorry, I'm having a hard time following this email thread.

What is your ultimate goal or the actual problem you're trying to solve?

(BTW, I would recommend against trying to do three expansions in one.
It might be more terse, but it's hard to read and as you found out,
leads to weird behavior)



Re: Indices of array variables are sometimes considered unset (or just display an error).

2018-11-05 Thread Grisha Levit
On Mon, Nov 5, 2018 at 10:38 PM Eduardo Bustamante 
wrote:
> Sorry, I'm having a hard time following this email thread.

I *think* the point is that OP expected that:

(a) ${!var[@]-foo} expands to the indexes of var if ${var[@]} if set, else
to `foo'

whereas the behavior they observed is:

(b) ${!var[@]-foo} expands to the value of the variable whose name is
stored
in ${var[@]} or to `foo' if that variable is unset

Their expectation seems reasonable since "the variable whose name is stored
in ${var[@]}" is kind of a weird thing.