printf, binary data, leading single quote and non ASCII codesets
Hi, bash manpage says for printf contains the following statement: if the leading character is a single or double quote, the value is the ASCII value of the following character. POSIX uses a different wording: If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote. Bash says that ASCII will always be used, POSIX says that the conversion is codeset-dependent. Bash code seems to agree with POSIX and contradict its manpage: $ printf -v b "\xc3\xbf\xbe" $ printf "$b" | { while LC_ALL=C read -N1 c; do \ LC_ALL=C printf "%d %q\n" "'$c" "$c"; done; } 195 $'\303' 191 $'\277' 190 $'\276' $ printf "$b" | { while LC_ALL=C read -N1 c; do \ LC_ALL=C.UTF-8 printf "%d %q\n" "'$c" "$c"; done; } 195 $'\303' 255 $'\277' 190 $'\276' Should the manpage be changed? Or the code modified to always use ASCII as the reference codeset? Regards, -- Gioele Barabucci
Re: printf, binary data, leading single quote and non ASCII codesets
On Mon, Jun 24, 2024 at 2:37 PM Gioele Barabucci wrote: > $ printf -v b "\xc3\xbf\xbe" > > $ printf "$b" | { while LC_ALL=C read -N1 c; do \ > LC_ALL=C.UTF-8 printf "%d %q\n" "'$c" "$c"; done; } > 195 $'\303' > 255 $'\277' > 190 $'\276' Can't reproduce this on Ubuntu 22.04 with Bash 5.3-alpha. It says 191 here, not 255.
Re: printf, binary data, leading single quote and non ASCII codesets
termux : printf "$b" | { while LC_ALL=C read -N1 c; do LC_ALL=C.UTF-8 printf "%d %q\n" "'$c" "$c"; done; } 255 ÿ 190 $'\276' On Mon, Jun 24, 2024, 1:37 PM Gioele Barabucci wrote: > Hi, > > bash manpage says for printf contains the following statement: > > > if the leading character is a single or double quote, the value is > > the ASCII value of the following character. > > POSIX uses a different wording: > > > If the leading character is a single-quote or double-quote, the value > > shall be the numeric value in the underlying codeset of the character > > following the single-quote or double-quote. > Bash says that ASCII will always be used, POSIX says that the conversion > is codeset-dependent. > > Bash code seems to agree with POSIX and contradict its manpage: > > $ printf -v b "\xc3\xbf\xbe" > > $ printf "$b" | { while LC_ALL=C read -N1 c; do \ > LC_ALL=C printf "%d %q\n" "'$c" "$c"; done; } > 195 $'\303' > 191 $'\277' > 190 $'\276' > > $ printf "$b" | { while LC_ALL=C read -N1 c; do \ > LC_ALL=C.UTF-8 printf "%d %q\n" "'$c" "$c"; done; } > 195 $'\303' > 255 $'\277' > 190 $'\276' > > Should the manpage be changed? Or the code modified to always use ASCII > as the reference codeset? > > Regards, > > -- > Gioele Barabucci > >
Proposal for a New Bash Option: failfast for Immediate Pipeline Failure
Dear Bash Maintainers, I have encountered a challenge with the current implementation of pipelines in Bash, specifically regarding subshells and the pipefail option. As documented, each command in a pipeline is executed in its own subshell, and Bash waits for all commands in the pipeline to terminate before returning a value. To address these issues, I propose the introduction of a new option, failfast, which would immediately terminate the pipeline if any command in the pipeline fails. This would streamline error handling and provide more predictable script execution, aligning with user expectations in many common use cases. Here is an example illustrating the proposed behavior: #!/bin/bash set -o failfast nonexisting-command | sleep 10# Pipeline exits immediately without sleeping for 10 seconds This would provide a more intuitive and robust error handling mechanism for pipeline commands, enhancing the usability and reliability of Bash scripting. I look forward to your feedback. Best regards, Mateusz Kurowski
readarray leaves a NULL char embedded in each element
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Wall uname output: Linux maggid 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.1 Patch Level: 16 Release Status: release Description: When using space or newline as a delimiter with readarray -d, elements in the array have the delimiter replaced with NULL, which is left embedded in each element of the array. This causes incorrect behavior when using array elements as arguments to sub-processes. Using other delimiters, like comma, slash, has no problem. Also using -t to remove the trailing delimiter is ok. I have reproduced this problem with the latest bash version 5.3 alpha. Repeat-By: I first noticed the problem when trying to use an array element as part of an argument to sed: readarray -d ' ' x << "A B" sed -e s/X/${x[0]}/ This caused sed to complain "unterminated `s' command". Using "read -a" instead of readarray produces correct results. It does not appear to matter where the input for readarray comes from, be it <, <<, or <<<. The actual delimiter also does not appear relevant. With a simple C program to print out the characters in argv[1], one can see that a NULL character is left in the argument. Program: #include #include void main(int argc, char *argv[]) { int i, n; if (argc > 1) { n = strlen(argv[1]); for (i=0; i
Re: readarray leaves a NULL char embedded in each element
On Mon, 24 Jun 2024 10:50:15 -0600, Rob Gardner wrote: [reformatted] > $ readarray -d ' ' X <<< "A B C" This does not remove the separator, so X[0] ends up containing "A ", X[1] contains "B ", and X[2] contains "C\n" (as there was no trailing space to terminate the string). See: $ declare -p X declare -a X=([0]="A " [1]="B " [2]=$'C\n') > $ ./printarg ${X[0]}A > 65 0 65 $ You are not quoting the expansion. This means that your C code is receiving TWO arguments, namely "A" and "A". Remember that in C, strings are NUL-terminated. While you only print argv[1], you're printing too many characters (you go up until n+2), so your code ends up printing 65 (the first "A"), 0 (the string terminator), and 65 again (the second "A", which presumably is contiguous in memory). So your code is printing 0 simply because in C, strings are NUL-terminated and you're reading that NUL and one character past it. Nothing to do wth bash. For completeness, if you quote the expansion, you get (at least on my system): $ ./printarg "${X[0]}A" 65 32 65 0 83 That is, "A", a space, and "A" again (which is the result of the quoted expansion), 0 for the string terminator, and a random 83 which is whatever follows in memory (strangely, it seems to be 83 consistently though). Again, nothing to do with bash. > $ read -d ' ' -a Y <<< "A B C" This only reads "A" into Y. Without quotes, the result is the same as in the first example, except that Y[0] contains "A", not "A ", so when the expansion takes place, the two "A"'s are concatenated and the C code sees "AA". $ declare -p Y declare -a Y=([0]="A") $ ./printarg ${Y[0]}A# or "${Y[0]}A", here there's no difference 65 65 0 83 > $ readarray -td ' ' Z <<< "A B C" This produces $ declare -p Z declare -a Z=([0]="A" [1]="B" [2]=$'C\n') > $ ./printarg ${Z[0]}A > 65 65 0 83 $ This is again correct given all the above. The only issue I see is that your C code is reading too many characters from the string it's given. -- D.
Re: readarray leaves a NULL char embedded in each element
On Mon, Jun 24, 2024 at 10:50:15 -0600, Rob Gardner wrote: > Description: > When using space or newline as a delimiter with readarray -d, > > elements in the array have the delimiter replaced with NULL, > > which is left embedded in each element of the array. This isn't possible. Bash doesn't allow the storing of NUL bytes in variables, and further, Unix/Linux doesn't permit passing NUL bytes as command-line arguments to programs. > This > causes incorrect behavior when using array elements as arguments to > sub-processes. (Bash cannot pass a NUL byte as an argument.) > I first noticed the problem when trying to use an array element as > part of an > argument to sed: > readarray -d ' ' x << "A B" > sed -e s/X/${x[0]}/ First point, your readarray command is using the wrong redirection operator. I'm fairly sure you meant to write <<< instead of <<. Using the here-string operator <<<, we can see that the first array element retains the space delimiter (because -t was not used), and the second retains the newline character, which is added by <<<. hobbit:~$ readarray -d ' ' x <<< "A B" hobbit:~$ declare -p x declare -a x=([0]="A " [1]=$'B\n') Second point, your sed command is not using quotes. > This caused sed to complain "unterminated `s' command". The space at the end of x[0] causes word splitting to occur, due to the lack of quotes. The s/X/A part becomes one argument, and the / part becomes a second argument. > Using "read -a" instead of readarray produces correct results. That one uses IFS to separate and trim the input fields. The default IFS contains a space, so none of the array elements contains a space. Therefore, your lack of quoting probably doesn't cause any additional word splitting. > With a simple C program to print out the characters in argv[1], one > can see that a NULL character is left in the argument. Program: > #include > #include > void main(int argc, char *argv[]) > { > int i, n; > if (argc > 1) { > n = strlen(argv[1]); > for (i=0; i } > } I'm not at all clear on what this C program is doing. You're putting a single character/byte on the stack for printf to process using the %d operator, which... expects an integer? And therefore reads more than one byte from the stack? Sorry, it's been ages since I did C. > $ readarray -d ' ' X <<< "A B C" > $ read -d ' ' -a Y <<< "A B C" > $ readarray -td ' ' Z <<< "A B C" > $ ./printarg ${X[0]}A > 65 0 65 $ In this command, ${X[0]} is a capital A plus a space character. You're not using quotes, so ${X[0]}A becomes the two argument words "A" and "A". hobbit:~$ readarray -d ' ' X <<< "A B C" hobbit:~$ declare -p X declare -a X=([0]="A " [1]="B " [2]=$'C\n') hobbit:~$ printf '<%s> ' ${X[0]}A ; echo Your C program appears to look only at the first argument word, "A", and ignores the second word. It takes strlen("A"), which is 1, and adds 2 to it, getting 3. Thus, it loops 3 times, and thus, we see the three numbers it writes to stdout. The argument words are stored internally as NUL-terminated strings, so it's no surprise that the second loop iteration prints a 0. The third loop iteration is printing random garbage from beyond the end of the argument string, unless I'm misreading the situation. > $ ./printarg ${Y[0]}A > 65 65 0 83 $ Here, Y[0] contains "A", so you're passing "AA" as your sole argument. The argument's string length is 2, so you're looping 4 times. The numbers 65 65 0 are from the internal storage of the argument words, and the 83 is garbage from beyond the end of the string. > $ ./printarg ${Z[0]}A > 65 65 0 83 $ Here, Z[0] is "A" instead of "A ", because you used -t to trim the space. So you're passing "AA" as your argument, just like the previous call. So, in a nutshell, this is what I believe you need to see: 1) readarray without -t retains the delimiter, even if it's a space or newline. It does not convert the delimiter to a NUL byte. 2) Unquoted ${X[0]} when X[0] ends with a space causes word splitting to occur, so anything after the ${X[0]} will become a new word (assuming IFS hasn't been modified). 3) Arguments passed to a program via the Unix kernel are NUL-terminated strings. Therefore, the NUL byte can't be part of the argument itself. It's a signpost that the argument string has ended.
Re: readarray leaves a NULL char embedded in each element
On Mon, 24 Jun 2024 20:01:45 +0200, Davide Brini wrote: > On Mon, 24 Jun 2024 10:50:15 -0600, Rob Gardner > wrote: > > and a random 83 which is whatever follows in memory (strangely, it seems > to be 83 consistently though). Some more investigation shows that the "S" is the beginning of "SHELL=/bin/bash..." so I assume that environment variables follow the argc arguments. -- D.
Re: readarray leaves a NULL char embedded in each element
On Mon, Jun 24, 2024 at 20:01:45 +0200, Davide Brini wrote: > $ ./printarg "${X[0]}A" > 65 32 65 0 83 > > That is, "A", a space, and "A" again (which is the result of the quoted > expansion), 0 for the string terminator, and a random 83 which is > whatever follows in memory (strangely, it seems to be 83 consistently > though). As a guess, it might be coming from envp[], the environment variables. On my system at least, the first environment variable happens to be SHELL. And of course 83 is 'S'. Obviously this isn't a safe thing to do. The compiler would be within its rights to arrange for a segfault when reading beyond the end of an individual argv[] string. It's just bad luck that it printed garbage output instead of crashing.
function names starting with %
You can do these $ %f(){ :;} $ declare -f %f %f () { : } $ unset -f %f $ declare -f %f $ echo $? 1 but not call them $ %f bash: fg: %f: no such job $ '%f' bash: fg: %f: no such job $ \%f bash: fg: %f: no such job Why is that? Would it be a bad idea to let such functions take precedence over jobspecs? Oğuz
use-after-free in set -o vi mode
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Wall uname output: Linux blob 6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.2 Patch Level: 15 Release Status: release Description: When I run bash under valgrind, and run set -o vi, and then type ESC d 1 C valgrind says Invalid read of size 4 at 0x1D536A: _rl_vi_domove_motion_cleanup (vi_mode.c:1193) by 0x1D5AA7: rl_vi_domove (vi_mode.c:1355) by 0x1D5AA7: rl_vi_delete_to (vi_mode.c:1417) by 0x1D168D: _rl_dispatch_subseq (readline.c:916) by 0x1D1E37: _rl_dispatch (readline.c:860) by 0x1D1E37: readline_internal_char (readline.c:675) by 0x1D275C: readline_internal_charloop (readline.c:721) by 0x1D275C: readline_internal (readline.c:733) by 0x1D275C: readline (readline.c:387) by 0x13C9A9: yy_readline_get (parse.y:1543) by 0x13F432: yy_getc (parse.y:1477) by 0x13F432: shell_getc (parse.y:2408) by 0x141B1A: read_token.constprop.0 (parse.y:3418) by 0x145E78: yylex (parse.y:2905) by 0x145E78: yyparse (y.tab.c:1854) by 0x13BF79: parse_command (eval.c:348) by 0x13C107: read_command (eval.c:392) by 0x13C2BD: reader_loop (eval.c:139) Address 0x54c42c8 is 24 bytes inside a block of size 36 free'd at 0x484810F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x1D5C70: _rl_mvcxt_dispose (vi_mode.c:1150) by 0x1D5C70: rl_vi_change_to (vi_mode.c:1523) by 0x1D168D: _rl_dispatch_subseq (readline.c:916) by 0x1D567A: rl_domove_motion_callback (vi_mode.c:1167) by 0x1D5AA7: rl_vi_domove (vi_mode.c:1355) by 0x1D5AA7: rl_vi_delete_to (vi_mode.c:1417) by 0x1D168D: _rl_dispatch_subseq (readline.c:916) by 0x1D1E37: _rl_dispatch (readline.c:860) by 0x1D1E37: readline_internal_char (readline.c:675) by 0x1D275C: readline_internal_charloop (readline.c:721) by 0x1D275C: readline_internal (readline.c:733) by 0x1D275C: readline (readline.c:387) by 0x13C9A9: yy_readline_get (parse.y:1543) by 0x13F432: yy_getc (parse.y:1477) by 0x13F432: shell_getc (parse.y:2408) by 0x141B1A: read_token.constprop.0 (parse.y:3418) by 0x145E78: yylex (parse.y:2905) by 0x145E78: yyparse (y.tab.c:1854) Block was alloc'd at at 0x4845828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x1A6051: xmalloc (xmalloc.c:114) by 0x1D5AD4: _rl_mvcxt_alloc (vi_mode.c:1142) by 0x1D5AD4: rl_vi_delete_to (vi_mode.c:1386) by 0x1D168D: _rl_dispatch_subseq (readline.c:916) by 0x1D1E37: _rl_dispatch (readline.c:860) by 0x1D1E37: readline_internal_char (readline.c:675) by 0x1D275C: readline_internal_charloop (readline.c:721) by 0x1D275C: readline_internal (readline.c:733) by 0x1D275C: readline (readline.c:387) by 0x13C9A9: yy_readline_get (parse.y:1543) by 0x13F432: yy_getc (parse.y:1477) by 0x13F432: shell_getc (parse.y:2408) by 0x141B1A: read_token.constprop.0 (parse.y:3418) by 0x145E78: yylex (parse.y:2905) by 0x145E78: yyparse (y.tab.c:1854) by 0x13BF79: parse_command (eval.c:348) by 0x13C107: read_command (eval.c:392) Repeat-By: See above.
Re: 'declare -i var=var' for var initially declared in calling scope
On 6/17/24 10:38 PM, Zachary Santer wrote: Bash Version: 5.2 Patch Level: 26 Release Status: release Description: It just occurred to me that I could take advantage of arithmetic evaluation being performed when a variable with the -i integer attribute is assigned a value to create a visual distinction between working with regular variables and variables representing integers, i.e.: local this_var="words words words" local var[index]="${this_var}" local -r -i BIT_FLAG=2#0100 local -i value=BIT_FLAG After doing this, var[index] will be set to "words words words" and value will be set to 4. Going a step further and passing integer arguments to functions as the name of the variable, rather than passing its value with a parameter expansion of that variable, we run into a problem. I would expect func1 () and func2 () below to be equivalent. However, we see that when setting a local variable with the integer attribute, using the name of a variable at calling scope with the same name, the local variable is always assigned the value 0, if I'm not using an arithmetic expansion. I wondered if the call to 'local' was already referencing the local variable that it was in the process of declaring, which I imagine would've been unset at the time. 'set -o nounset' didn't cause it to error out, though. Of course 'declare -i other=other' is a no-op, but it doesn't break. Repeat-By: $ cat integer-scope #!/usr/bin/env bash set -o nounset main () { printf 'foo :\n' declare -i foo=1 func1 foo func2 foo printf 'bar :\n' declare -i bar=2 func1 bar func2 bar printf '"${baz}" :\n' declare -i baz=3 func1 "${baz}" func2 "${baz}" printf 'other :\n' declare -i other=4 declare -i other=other declare -p other } func1 () { declare -i bar="${1}" declare -p bar } func2 () { declare -i bar="$(( ${1} ))" declare -p bar } main $ ./integer-scope foo : declare -i bar="1" declare -i bar="1" `declare' is a builtin, so its arguments are expanded before it is called. In the first example, you get what is essentially declare -i bar; bar=foo (to make it clear that the variable is created with the specified attributes before it's assigned to). Since the integer attribute is set, foo is treated as an expression, the variable with that name in the nearest scope gets expanded, and you have bar=1. In the second, you have the same thing (foo is not a variable local to func2's scope), but with a different syntax for the arithmetic evaluation that doesn't rely on the assignment to do it implicitly, so it's declare -i bar; bar=1 bar : declare -i bar="0" declare -i bar="2" Same thing. You have declare -i bar; bar=bar (the variable is created before being assigned to). Since bar is a local variable at the current scope and it's unset, when the assignment is performed, it has value 0, and that gets assigned. Again, in the second example, since the arithmetic expansion is performed first, before the local variable is created, you use the variable in the nearest scope and get essentially declare -i bar; bar=2 "${baz}" : declare -i bar="3" declare -i bar="3" This is the same as the first example. other : declare -i other="4" In this example, main is creating a local variable and then using its value in the next assignment. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: printf, binary data, leading single quote and non ASCII codesets
On 6/24/24 7:36 AM, Gioele Barabucci wrote: Hi, bash manpage says for printf contains the following statement: if the leading character is a single or double quote, the value is the ASCII value of the following character. Thanks for the report. The man page should have been updated when I changed that code to use multibyte characters many years ago. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
--rcfile and non existing files
Hi, the manpage states: When an interactive shell that is not a login shell is started, bash reads and executes commands from /etc/bash.bashrc and ~/.bashrc, if these files exist. […] The --rcfile file option will force bash to read and execute commands from file instead of /etc/bash.bashrc and ~/.bashrc. This wording suggests (or may suggest) that `bash --rcfile foo` will execute file `foo`, and complain if it contains an error or if it does not exists. Instead, `bash --rcfile foo` will happily carry on if `foo` does not exist, probably applying the same logic used for `~/.bashrc`. https://bugs.debian.org/1042394 reports that this happen for `--init-file` as well. Should the manpage be reworded to avoid stating "will force bash to read..."? Or should passing a non existing file to `--rcfile` turned into an error? Regards, -- Gioele Barabucci