printf '\uFEFF' outputs invalid UTF-8 on Windows
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: msys Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='msys' -DCONF_MACHTYPE='x86_64-pc-msys' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -DRECYCLES_PIDS -I. -I. -I./include -I./lib -DWORDEXP_OPTION -Wno-discarded-qualifiers -march=x86-64 -mtune=generic -O2 -pipe -Wno-parentheses -Wno-format-security -D_STATIC_BUILD -g uname output: MINGW64_NT-6.1 fjkallen 2.10.0(0.325/5/3) 2018-07-25 13:06 x86_64 Msys Machine Type: x86_64-pc-msys Bash Version: 4.4 Patch Level: 19 Release Status: release Description: The builtin printf '\uFEFF' outputs ED 9F BF ED BB BF in a UTF-8 locale on Microsoft Windows, where sizeof(wchar_t) == 2. It should output EF BB BF, like printf (GNU coreutils) 8.30 does. The incorrect output ED 9F BF ED BB BF is a UTF-8-like encoding of U+D7FF U+DEFF, which looks somewhat like a UTF-16 surrogate pair but the U+D7FF character is not in the surrogate range. Repeat-By: Install Git for Windows 2.19.1, on Windows 7 SP1. Start "Git Bash" from the Start menu. Run the command: env --ignore-environment LANG=en_US.UTF-8 \ /usr/bin/bash --noprofile -c 'builtin printf "\ufeff"' \ | od -t x1 Fix: In lib/sh/unicode.c, change u32toutf16 to treat characters in the U+E000...U+ range just like the U+...U+D7FF range, i.e. copy them unchanged to the output and not make a surrogate pair. I did not test that change but the function clearly has a bug and it matches the symptoms perfectly.
Re: printf '\uFEFF' outputs invalid UTF-8 on Windows
On 11/5/18 12:09 PM, Kalle Olavi Niemitalo wrote: > Bash Version: 4.4 > Patch Level: 19 > Release Status: release > > Description: > The builtin printf '\uFEFF' outputs ED 9F BF ED BB BF in a > UTF-8 locale on Microsoft Windows, where sizeof(wchar_t) == 2. > It should output EF BB BF, like printf (GNU coreutils) 8.30 > does. Thanks for the report. This has been fixed for almost exactly two years in the devel branch, the result of http://lists.gnu.org/archive/html/bug-bash/2016-11/msg00039.html and is fixed in the bash-5.x alpha and beta versions. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Indices of array variables are sometimes considered unset (or just display an error).
uname output: Linux ArchBox0 4.18.16-arch1-1-ARCH #1 SMP PREEMPT Sat Oct 20 22:06:45 UTC 2018 x86_64 GNU/Linux Machine Type: x86_64-unknown-linux-gnu Bash Version: 4.4 Patch Level: 23 Release Status: release --text follows this line-- Description: The parameter expansion "${!var[@]}" expands to the indices of an array (whether linear or associative). The expansion "${var-string}" returns "${var}" iff var is set and 'string' otherwise. These two features do not play well together: $ declare -a -- array=([0]=hello [1]=world) $ printf -- '%s\n\n' "${!array[@]-Warning: unset}" bash: hello world: bad substitution $ declare -a -- array=([0]='helloworld') $ printf -- '%s\n\n' "${!array[@]-Warning: unset}" Warning: unset $ declare -a -- array=([0]='hello world') $ printf -- '%s\n\n' "${!array[@]-Warning: unset}" bash: hello world: bad substitution $ declare -a -- array=() $ printf -- '%s\n\n' "${!array[@]-Warning: unset}" Warning: unset As you can see, accessing the index list of multiple-element arrays fails when you append the unset expansion. With single-element arrays, it fails iff the element in question contains any special characters or whitespace, and thinks the array is unset otherwise. (Further testing shows that a value of the empty string also throws an error.) Finally, empty arrays are also considered unset. (This is the one thing that is consistent with the rest of bash, since empty arrays themselves are also considered unset by this expansion; that is, "${array[@]-unset}" yields 'unset' when array isn't set.) This pattern of behavior is apparently unaffected by changes to IFS, using a normal variable as a one-element array, using an unset variable as a zero-element array, or using an associative instead of linear array. That last one has an interesting wrinkle, however: $ declare -A -- assoc=(['k e y']='element') $ printf -- '%s\n\n' "${!assoc[@]-Warning: unset}" Warning: unset $ declare -A -- assoc=(['key']='e l e m e n t') $ printf -- '%s\n\n' "${!assoc[@]-Warning: unset}" bash: e l e m e n t: bad substitution Strangely, whether a single-element array errors (as opposed to giving the wrong result) is only dependent on the the characters in the *element*, not the *key*---despite the fact that only the key's value is being requested! Repeat-By: $ declare -a arr_2_=(zero one); printf '%s\n' "${!arr_2_[@]-unset}" bash: zero one: bad substitution $ declare -a arr_1a=('z e r o'); printf '%s\n' "${!arr_1a[@]-unset}" bash: z e r o: bad substitution $ declare -a arr_1b=('zero');printf '%s\n; "${!arr_1b[@]-unset}" unset Fix: To avoid this problem, you just need to spend another line or two writing out the relevant conditional explicitly; for example: # ... "${!array[@]-}" if [ -v 'array[@]' ]; then ... "${!array[@]}" ... else ... ... fi Note that `test -v 'array[@]'` has the same "feature" that "${array[@]-default}" does: it treats empty arrays as unset.
Re: Indices of array variables are sometimes considered unset (or just display an error).
On Mon, Nov 5, 2018 at 4:56 PM Great Big Dot wrote: > The parameter expansion "${!var[@]}" expands to the indices of an array (whether linear or associative). Hold up... when I view this email on the public archives, all of my "${array[@]}"'s (that is, "${array[]}"'a) got turned to "address@hidden"'s. Was I supposed to use some escape sequence or something? Is everyone who's subscribed to the mailing list able to see the actual text? Or should I resend this bug report with all \@-signs escaped somehow? Testing... test...@example.com testing@example.com testing﹫example.com testing\@example.com testing @ example.com
Re: Indices of array variables are sometimes considered unset (or just display an error).
> On Mon, Nov 5, 2018 at 4:56 PM Great Big Dot wrote: > [... A]ccessing the index list of multiple-element arrays > fails when you append the unset expansion. With single-element > arrays, it fails iff the element in question contains any special > characters or whitespace, and thinks the array is unset otherwise. > (Further testing shows that a value of the empty string also throws > an error.) Finally, empty arrays are also considered unset[...] Oops, just realized what's causing this. I guess it isn't necessarily a bug? Debatable, I guess. What's actually happening here is that the *indirection* expansion "${!foo}", and not the *indices* expansion "${!foo[@]}", is what is being preformed on something like "${!array[@]-}". Both expansions, while unrelated, happen to use the same syntax, with the exception that indirections apply to normal variables and index expansions apply to array variables. For some reason, adding on the "${foo-default}" expansion causes the former to be used instead of the latter. This can be seen here: $ array=(foo) $ printf -- '%s\n' "${!foo[@]-unset}" unset $ foo='hello world' $ printf -- '%s\n' "${!foo[@]-unset}" hello world So first the array is expanded, and then it's treated as a redirection, and then the unset part kicks in if the array's value isn't an extant variable name. This explains all the observations I made. I still think it makes more sense if the "!" in "${!array[@]}" triggered index expansion instead. At the very least, surely it should be one of those expansion combinations that just isn't allowed, like "${#foo[@]-default}" (actually, why is that disallowed?). Anyways, I don't really see the point of the current behavior. > This pattern of behavior is apparently unaffected by changes to IFS[...] Upon further examination, and in light of the above realization, this actually isn't true. In particular, iff the first character of IFS is alphanumeric or an underscore (or if IFS is the empty string), and if you use the "${array[*]}" form instead, then the expansion doesn't throw an error when the array contains more than one element. E.g.: $ array=(foo bar) $ printf -- '%s\n' "${!array[*]-Warning: unset}" bash: foo bar: bad substitution $ IFS='_' $ printf -- '%s\n' "${!array[*]-Warning: unset}" Warning: unset $ foo_bar='Beto2018' $ printf -- '%s\n' "${!array[*]-Warning: unset}" Beto2018 $ IFS='' $ printf -- '%s\n' "${!array[*]-Warning: unset}" Warning: unset $ foobar='Hello, world' $ printf -- '%s\n' "${!array[*]-Warning: unset}" Hello, world Though I understand it now, the above behavior doesn't seem especially motivated to me. I mean, the variables that end up getting expanded don't actually have their names stored anywhere, yet the indirection points to them. Is there a good reason for treating "${!array[@]-}" and "${!array[*]-}" like indirections instead of index expansions (or just throwing an error)?
Re: Indices of array variables are sometimes considered unset (or just display an error).
On Mon, Nov 5, 2018 at 6:01 PM Great Big Dot wrote: (...) > > [... A]ccessing the index list of multiple-element arrays > > fails when you append the unset expansion. With single-element > > arrays, it fails iff the element in question contains any special > > characters or whitespace, and thinks the array is unset otherwise. > > (Further testing shows that a value of the empty string also throws > > an error.) Finally, empty arrays are also considered unset[...] > > Oops, just realized what's causing this. I guess it isn't necessarily a > bug? Debatable, I guess. > > What's actually happening here is that the *indirection* expansion > "${!foo}", and not the *indices* expansion "${!foo[@]}", is what is being > preformed on something like "${!array[@]-}". Both expansions, while > unrelated, happen to use the same syntax, with the exception that > indirections apply to normal variables and index expansions apply to array > variables. For some reason, adding on the "${foo-default}" expansion causes > the former to be used instead of the latter. This can be seen here: Sorry, I'm having a hard time following this email thread. What is your ultimate goal or the actual problem you're trying to solve? (BTW, I would recommend against trying to do three expansions in one. It might be more terse, but it's hard to read and as you found out, leads to weird behavior)
Re: Indices of array variables are sometimes considered unset (or just display an error).
On Mon, Nov 5, 2018 at 10:38 PM Eduardo Bustamante wrote: > Sorry, I'm having a hard time following this email thread. I *think* the point is that OP expected that: (a) ${!var[@]-foo} expands to the indexes of var if ${var[@]} if set, else to `foo' whereas the behavior they observed is: (b) ${!var[@]-foo} expands to the value of the variable whose name is stored in ${var[@]} or to `foo' if that variable is unset Their expectation seems reasonable since "the variable whose name is stored in ${var[@]}" is kind of a weird thing.