printf, binary data, leading single quote and non ASCII codesets

2024-06-24 Thread Gioele Barabucci

Hi,

bash manpage says for printf contains the following statement:


if the leading character is a single or double quote, the value is
the ASCII value of the following character.


POSIX uses a different wording:


If the leading character is a single-quote or double-quote, the value
shall be the numeric value in the underlying codeset of the character
following the single-quote or double-quote.
Bash says that ASCII will always be used, POSIX says that the conversion 
is codeset-dependent.


Bash code seems to agree with POSIX and contradict its manpage:

$ printf -v b "\xc3\xbf\xbe"

$ printf "$b" | { while LC_ALL=C read -N1 c; do \
LC_ALL=C printf "%d %q\n" "'$c" "$c"; done; }
195 $'\303'
191 $'\277'
190 $'\276'

$ printf "$b" | { while LC_ALL=C read -N1 c; do \
LC_ALL=C.UTF-8 printf "%d %q\n" "'$c" "$c"; done; }
195 $'\303'
255 $'\277'
190 $'\276'

Should the manpage be changed? Or the code modified to always use ASCII 
as the reference codeset?


Regards,

--
Gioele Barabucci



Re: printf, binary data, leading single quote and non ASCII codesets

2024-06-24 Thread Oğuz
On Mon, Jun 24, 2024 at 2:37 PM Gioele Barabucci  wrote:
> $ printf -v b "\xc3\xbf\xbe"
>
> $ printf "$b" | { while LC_ALL=C read -N1 c; do \
>  LC_ALL=C.UTF-8 printf "%d %q\n" "'$c" "$c"; done; }
> 195 $'\303'
> 255 $'\277'
> 190 $'\276'

Can't reproduce this on Ubuntu 22.04 with Bash 5.3-alpha. It says 191
here, not 255.



Re: printf, binary data, leading single quote and non ASCII codesets

2024-06-24 Thread alex xmb sw ratchev
termux :

printf "$b" | { while LC_ALL=C read -N1 c; do  LC_ALL=C.UTF-8 printf
"%d %q\n" "'$c" "$c"; done; }
255 ÿ
190 $'\276'

On Mon, Jun 24, 2024, 1:37 PM Gioele Barabucci  wrote:

> Hi,
>
> bash manpage says for printf contains the following statement:
>
> > if the leading character is a single or double quote, the value is
> > the ASCII value of the following character.
>
> POSIX uses a different wording:
>
> > If the leading character is a single-quote or double-quote, the value
> > shall be the numeric value in the underlying codeset of the character
> > following the single-quote or double-quote.
> Bash says that ASCII will always be used, POSIX says that the conversion
> is codeset-dependent.
>
> Bash code seems to agree with POSIX and contradict its manpage:
>
> $ printf -v b "\xc3\xbf\xbe"
>
> $ printf "$b" | { while LC_ALL=C read -N1 c; do \
>  LC_ALL=C printf "%d %q\n" "'$c" "$c"; done; }
> 195 $'\303'
> 191 $'\277'
> 190 $'\276'
>
> $ printf "$b" | { while LC_ALL=C read -N1 c; do \
>  LC_ALL=C.UTF-8 printf "%d %q\n" "'$c" "$c"; done; }
> 195 $'\303'
> 255 $'\277'
> 190 $'\276'
>
> Should the manpage be changed? Or the code modified to always use ASCII
> as the reference codeset?
>
> Regards,
>
> --
> Gioele Barabucci
>
>


Proposal for a New Bash Option: failfast for Immediate Pipeline Failure

2024-06-24 Thread ama bamo
Dear Bash Maintainers,

I have encountered a challenge with the current implementation of pipelines
in Bash, specifically regarding subshells and the pipefail option. As
documented, each command in a pipeline is executed in its own subshell, and
Bash waits for all commands in the pipeline to terminate before returning a
value.

To address these issues, I propose the introduction of a new option,
failfast, which would immediately terminate the pipeline if any command in
the pipeline fails. This would streamline error handling and provide more
predictable script execution, aligning with user expectations in many
common use cases.

Here is an example illustrating the proposed behavior:


#!/bin/bash
set -o failfast
nonexisting-command | sleep 10# Pipeline exits immediately without
sleeping for 10 seconds


This would provide a more intuitive and robust error handling mechanism for
pipeline commands, enhancing the usability and reliability of Bash
scripting.

I look forward to your feedback.

Best regards,

Mateusz Kurowski


readarray leaves a NULL char embedded in each element

2024-06-24 Thread Rob Gardner
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -flto=auto -ffat-lto-objects -flto=auto
-ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security
-Wall
uname output: Linux maggid 6.5.0-35-generic #35~22.04.1-Ubuntu SMP
PREEMPT_DYNAMIC Tue May  7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.1
Patch Level: 16
Release Status: release


Description:
When using space or newline as a delimiter with readarray -d,

elements in the array have the delimiter replaced with NULL,

which is left embedded in each element of the array. This
causes incorrect behavior when using array elements as arguments to
sub-processes.
Using other delimiters, like comma, slash, has no problem.
Also using -t to remove the trailing delimiter is ok.
I have reproduced this problem with the latest bash version 5.3
alpha.

Repeat-By:
I first noticed the problem when trying to use an array element as
part of an
argument to sed:
readarray -d ' ' x << "A B"
sed -e s/X/${x[0]}/
This caused sed to complain "unterminated `s' command".
Using "read -a" instead of readarray produces correct results.
It does not appear to matter where the input for readarray comes
from,
be it <, <<, or <<<. The actual delimiter also does not appear
relevant.
With a simple C program to print out the characters in argv[1], one
can see that a NULL character is left in the argument. Program:
#include 
#include 
void main(int argc, char *argv[])
{
int i, n;
if (argc > 1) {
n = strlen(argv[1]);
for (i=0; i

Re: readarray leaves a NULL char embedded in each element

2024-06-24 Thread Davide Brini
On Mon, 24 Jun 2024 10:50:15 -0600, Rob Gardner  wrote:

[reformatted]

> $ readarray -d ' ' X <<< "A B C"

This does not remove the separator, so X[0] ends up containing "A ", X[1]
contains "B ", and X[2] contains "C\n" (as there was no trailing
space to terminate the string).
See:

$ declare -p X
declare -a X=([0]="A " [1]="B " [2]=$'C\n')

> $ ./printarg ${X[0]}A
> 65 0 65 $

You are not quoting the expansion. This means that your C code is receiving
TWO arguments, namely "A" and "A". Remember that in C, strings are
NUL-terminated. While you only print argv[1], you're printing too many
characters (you go up until n+2), so your code ends up printing 65 (the
first "A"), 0 (the string terminator), and 65 again (the second "A", which
presumably is contiguous in memory).
So your code is printing 0 simply because in C, strings are NUL-terminated
and you're reading that NUL and one character past it. Nothing to do wth
bash.

For completeness, if you quote the expansion, you get (at least on my
system):

$ ./printarg "${X[0]}A"
65 32 65 0 83

That is, "A", a space, and "A" again (which is the result of the quoted
expansion), 0 for the string terminator, and a random 83 which is
whatever follows in memory (strangely, it seems to be 83 consistently
though). Again, nothing to do with bash.

> $ read -d ' ' -a   Y <<< "A B C"

This only reads "A" into Y. Without quotes, the result is the same as in
the first example, except that Y[0] contains "A", not "A ", so when the
expansion takes place, the two "A"'s are concatenated and the C code sees
"AA".

$ declare -p Y
declare -a Y=([0]="A")

$ ./printarg ${Y[0]}A# or "${Y[0]}A", here there's no difference
65 65 0 83


> $ readarray -td ' ' Z <<< "A B C"

This produces

$ declare -p Z
declare -a Z=([0]="A" [1]="B" [2]=$'C\n')

> $ ./printarg ${Z[0]}A
> 65 65 0 83 $

This is again correct given all the above. The only issue I see is that
your C code is reading too many characters from the string it's given.

--
D.



Re: readarray leaves a NULL char embedded in each element

2024-06-24 Thread Greg Wooledge
On Mon, Jun 24, 2024 at 10:50:15 -0600, Rob Gardner wrote:
> Description:
> When using space or newline as a delimiter with readarray -d,
> 
> elements in the array have the delimiter replaced with NULL,
> 
> which is left embedded in each element of the array.

This isn't possible.  Bash doesn't allow the storing of NUL bytes in
variables, and further, Unix/Linux doesn't permit passing NUL bytes as
command-line arguments to programs.

> This
> causes incorrect behavior when using array elements as arguments to
> sub-processes.

(Bash cannot pass a NUL byte as an argument.)

> I first noticed the problem when trying to use an array element as
> part of an
> argument to sed:
> readarray -d ' ' x << "A B"
> sed -e s/X/${x[0]}/

First point, your readarray command is using the wrong redirection
operator.  I'm fairly sure you meant to write <<< instead of <<.  Using
the here-string operator <<<, we can see that the first array element
retains the space delimiter (because -t was not used), and the second
retains the newline character, which is added by <<<.

hobbit:~$ readarray -d ' ' x <<< "A B"
hobbit:~$ declare -p x
declare -a x=([0]="A " [1]=$'B\n')

Second point, your sed command is not using quotes.

> This caused sed to complain "unterminated `s' command".

The space at the end of x[0] causes word splitting to occur, due to the
lack of quotes. The s/X/A part becomes one argument, and the / part
becomes a second argument.

> Using "read -a" instead of readarray produces correct results.

That one uses IFS to separate and trim the input fields.  The default
IFS contains a space, so none of the array elements contains a space.
Therefore, your lack of quoting probably doesn't cause any additional
word splitting.

> With a simple C program to print out the characters in argv[1], one
> can see that a NULL character is left in the argument. Program:
> #include 
> #include 
> void main(int argc, char *argv[])
> {
> int i, n;
> if (argc > 1) {
> n = strlen(argv[1]);
> for (i=0; i }
> }

I'm not at all clear on what this C program is doing.  You're putting a
single character/byte on the stack for printf to process using the %d
operator, which... expects an integer?  And therefore reads more than
one byte from the stack?

Sorry, it's been ages since I did C.

> $ readarray -d ' ' X <<< "A B C"
> $ read -d ' ' -a   Y <<< "A B C"
> $ readarray -td ' ' Z <<< "A B C"
> $ ./printarg ${X[0]}A
> 65 0 65 $

In this command, ${X[0]} is a capital A plus a space character.  You're
not using quotes, so ${X[0]}A becomes the two argument words "A" and "A".

hobbit:~$ readarray -d ' ' X <<< "A B C"
hobbit:~$ declare -p X
declare -a X=([0]="A " [1]="B " [2]=$'C\n')
hobbit:~$ printf '<%s> ' ${X[0]}A ; echo
  

Your C program appears to look only at the first argument word, "A",
and ignores the second word.  It takes strlen("A"), which is 1, and
adds 2 to it, getting 3.  Thus, it loops 3 times, and thus, we see
the three numbers it writes to stdout.

The argument words are stored internally as NUL-terminated strings, so
it's no surprise that the second loop iteration prints a 0.  The
third loop iteration is printing random garbage from beyond the end
of the argument string, unless I'm misreading the situation.

> $ ./printarg ${Y[0]}A
> 65 65 0 83 $

Here, Y[0] contains "A", so you're passing "AA" as your sole argument.
The argument's string length is 2, so you're looping 4 times.  The
numbers 65 65 0 are from the internal storage of the argument words, and
the 83 is garbage from beyond the end of the string.

> $ ./printarg ${Z[0]}A
> 65 65 0 83 $

Here, Z[0] is "A" instead of "A ", because you used -t to trim the space.
So you're passing "AA" as your argument, just like the previous call.

So, in a nutshell, this is what I believe you need to see:

 1) readarray without -t retains the delimiter, even if it's a space
or newline.  It does not convert the delimiter to a NUL byte.

 2) Unquoted ${X[0]} when X[0] ends with a space causes word splitting
to occur, so anything after the ${X[0]} will become a new word
(assuming IFS hasn't been modified).

 3) Arguments passed to a program via the Unix kernel are NUL-terminated
strings.  Therefore, the NUL byte can't be part of the argument
itself.  It's a signpost that the argument string has ended.



Re: readarray leaves a NULL char embedded in each element

2024-06-24 Thread Davide Brini
On Mon, 24 Jun 2024 20:01:45 +0200, Davide Brini  wrote:

> On Mon, 24 Jun 2024 10:50:15 -0600, Rob Gardner 
> wrote:
>
> and a random 83 which is whatever follows in memory (strangely, it seems
> to be 83 consistently though).

Some more investigation shows that the "S" is the beginning of
"SHELL=/bin/bash..." so I assume that environment variables follow the
argc arguments.

--
D.



Re: readarray leaves a NULL char embedded in each element

2024-06-24 Thread Greg Wooledge
On Mon, Jun 24, 2024 at 20:01:45 +0200, Davide Brini wrote:
> $ ./printarg "${X[0]}A"
> 65 32 65 0 83
> 
> That is, "A", a space, and "A" again (which is the result of the quoted
> expansion), 0 for the string terminator, and a random 83 which is
> whatever follows in memory (strangely, it seems to be 83 consistently
> though).

As a guess, it might be coming from envp[], the environment variables.
On my system at least, the first environment variable happens to be
SHELL.  And of course 83 is 'S'.

Obviously this isn't a safe thing to do.  The compiler would be within
its rights to arrange for a segfault when reading beyond the end of an
individual argv[] string.  It's just bad luck that it printed garbage
output instead of crashing.



function names starting with %

2024-06-24 Thread Oğuz
You can do these

$ %f(){ :;}
$ declare -f %f
%f ()
{
:
}
$ unset -f %f
$ declare -f %f
$ echo $?
1

but not call them

$ %f
bash: fg: %f: no such job
$ '%f'
bash: fg: %f: no such job
$ \%f
bash: fg: %f: no such job

Why is that? Would it be a bad idea to let such functions take
precedence over jobspecs?

Oğuz



use-after-free in set -o vi mode

2024-06-24 Thread rtm
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -flto=auto -ffat-lto-objects -flto=auto 
-ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security 
-Wall
uname output: Linux blob 6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri 
Apr 26 11:23:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.2
Patch Level: 15
Release Status: release

Description:

When I run bash under valgrind, and run set -o vi, and then type

ESC d 1 C

valgrind says

  Invalid read of size 4
 at 0x1D536A: _rl_vi_domove_motion_cleanup (vi_mode.c:1193)
 by 0x1D5AA7: rl_vi_domove (vi_mode.c:1355)
 by 0x1D5AA7: rl_vi_delete_to (vi_mode.c:1417)
 by 0x1D168D: _rl_dispatch_subseq (readline.c:916)
 by 0x1D1E37: _rl_dispatch (readline.c:860)
 by 0x1D1E37: readline_internal_char (readline.c:675)
 by 0x1D275C: readline_internal_charloop (readline.c:721)
 by 0x1D275C: readline_internal (readline.c:733)
 by 0x1D275C: readline (readline.c:387)
 by 0x13C9A9: yy_readline_get (parse.y:1543)
 by 0x13F432: yy_getc (parse.y:1477)
 by 0x13F432: shell_getc (parse.y:2408)
 by 0x141B1A: read_token.constprop.0 (parse.y:3418)
 by 0x145E78: yylex (parse.y:2905)
 by 0x145E78: yyparse (y.tab.c:1854)
 by 0x13BF79: parse_command (eval.c:348)
 by 0x13C107: read_command (eval.c:392)
 by 0x13C2BD: reader_loop (eval.c:139)
   Address 0x54c42c8 is 24 bytes inside a block of size 36 free'd
 at 0x484810F: free (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
 by 0x1D5C70: _rl_mvcxt_dispose (vi_mode.c:1150)
 by 0x1D5C70: rl_vi_change_to (vi_mode.c:1523)
 by 0x1D168D: _rl_dispatch_subseq (readline.c:916)
 by 0x1D567A: rl_domove_motion_callback (vi_mode.c:1167)
 by 0x1D5AA7: rl_vi_domove (vi_mode.c:1355)
 by 0x1D5AA7: rl_vi_delete_to (vi_mode.c:1417)
 by 0x1D168D: _rl_dispatch_subseq (readline.c:916)
 by 0x1D1E37: _rl_dispatch (readline.c:860)
 by 0x1D1E37: readline_internal_char (readline.c:675)
 by 0x1D275C: readline_internal_charloop (readline.c:721)
 by 0x1D275C: readline_internal (readline.c:733)
 by 0x1D275C: readline (readline.c:387)
 by 0x13C9A9: yy_readline_get (parse.y:1543)
 by 0x13F432: yy_getc (parse.y:1477)
 by 0x13F432: shell_getc (parse.y:2408)
 by 0x141B1A: read_token.constprop.0 (parse.y:3418)
 by 0x145E78: yylex (parse.y:2905)
 by 0x145E78: yyparse (y.tab.c:1854)
   Block was alloc'd at
 at 0x4845828: malloc (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
 by 0x1A6051: xmalloc (xmalloc.c:114)
 by 0x1D5AD4: _rl_mvcxt_alloc (vi_mode.c:1142)
 by 0x1D5AD4: rl_vi_delete_to (vi_mode.c:1386)
 by 0x1D168D: _rl_dispatch_subseq (readline.c:916)
 by 0x1D1E37: _rl_dispatch (readline.c:860)
 by 0x1D1E37: readline_internal_char (readline.c:675)
 by 0x1D275C: readline_internal_charloop (readline.c:721)
 by 0x1D275C: readline_internal (readline.c:733)
 by 0x1D275C: readline (readline.c:387)
 by 0x13C9A9: yy_readline_get (parse.y:1543)
 by 0x13F432: yy_getc (parse.y:1477)
 by 0x13F432: shell_getc (parse.y:2408)
 by 0x141B1A: read_token.constprop.0 (parse.y:3418)
 by 0x145E78: yylex (parse.y:2905)
 by 0x145E78: yyparse (y.tab.c:1854)
 by 0x13BF79: parse_command (eval.c:348)
 by 0x13C107: read_command (eval.c:392)

Repeat-By:

See above.




Re: 'declare -i var=var' for var initially declared in calling scope

2024-06-24 Thread Chet Ramey

On 6/17/24 10:38 PM, Zachary Santer wrote:


Bash Version: 5.2
Patch Level: 26
Release Status: release

Description:

It just occurred to me that I could take advantage of arithmetic
evaluation being performed when a variable with the -i integer
attribute is assigned a value to create a visual distinction between
working with regular variables and variables representing integers,
i.e.:

local this_var="words words words"
local var[index]="${this_var}"

local -r -i BIT_FLAG=2#0100
local -i value=BIT_FLAG

After doing this, var[index] will be set to "words words words" and
value will be set to 4.

Going a step further and passing integer arguments to functions as the
name of the variable, rather than passing its value with a parameter
expansion of that variable, we run into a problem. I would expect
func1 () and func2 () below to be equivalent. However, we see that
when setting a local variable with the integer attribute, using the
name of a variable at calling scope with the same name, the local
variable is always assigned the value 0, if I'm not using an
arithmetic expansion.

I wondered if the call to 'local' was already referencing the local
variable that it was in the process of declaring, which I imagine
would've been unset at the time. 'set -o nounset' didn't cause it to
error out, though.

Of course 'declare -i other=other' is a no-op, but it doesn't break.

Repeat-By:

$ cat integer-scope
#!/usr/bin/env bash

set -o nounset

main () {
   printf 'foo :\n'
   declare -i foo=1
   func1 foo
   func2 foo
   printf 'bar :\n'
   declare -i bar=2
   func1 bar
   func2 bar
   printf '"${baz}" :\n'
   declare -i baz=3
   func1 "${baz}"
   func2 "${baz}"
   printf 'other :\n'
   declare -i other=4
   declare -i other=other
   declare -p other
}

func1 () {
   declare -i bar="${1}"
   declare -p bar
}

func2 () {
   declare -i bar="$(( ${1} ))"
   declare -p bar
}

main

$ ./integer-scope
foo :
declare -i bar="1"
declare -i bar="1"


`declare' is a builtin, so its arguments are expanded before it is called.
In the first example, you get what is essentially

declare -i bar; bar=foo

(to make it clear that the variable is created with the specified
attributes before it's assigned to).

Since the integer attribute is set, foo is treated as an expression, the
variable with that name in the nearest scope gets expanded, and you have
bar=1.

In the second, you have the same thing (foo is not a variable local to
func2's scope), but with a different syntax for the arithmetic evaluation
that doesn't rely on the assignment to do it implicitly, so it's

declare -i bar; bar=1



bar :
declare -i bar="0"
declare -i bar="2"


Same thing. You have

declare -i bar; bar=bar

(the variable is created before being assigned to). Since bar is a local
variable at the current scope and it's unset, when the assignment is
performed, it has value 0, and that gets assigned.

Again, in the second example, since the arithmetic expansion is performed
first, before the local variable is created, you use the variable in the
nearest scope and get essentially

declare -i bar; bar=2



"${baz}" :
declare -i bar="3"
declare -i bar="3"


This is the same as the first example.


other :
declare -i other="4"


In this example, main is creating a local variable and then using its
value in the next assignment.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: printf, binary data, leading single quote and non ASCII codesets

2024-06-24 Thread Chet Ramey

On 6/24/24 7:36 AM, Gioele Barabucci wrote:

Hi,

bash manpage says for printf contains the following statement:


if the leading character is a single or double quote, the value is
the ASCII value of the following character.


Thanks for the report. The man page should have been updated when I changed
that code to use multibyte characters many years ago.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


--rcfile and non existing files

2024-06-24 Thread Gioele Barabucci

Hi,

the manpage states:


When an interactive shell that is not a login shell is started, bash
reads and executes commands from /etc/bash.bashrc and ~/.bashrc, if
these files exist.  […] The --rcfile file option will force bash to
read and execute commands from file instead of /etc/bash.bashrc and
~/.bashrc.
This wording suggests (or may suggest) that `bash --rcfile foo` will 
execute file `foo`, and complain if it contains an error or if it does 
not exists.


Instead, `bash --rcfile foo` will happily carry on if `foo` does not 
exist, probably applying the same logic used for `~/.bashrc`. 
https://bugs.debian.org/1042394 reports that this happen for 
`--init-file` as well.


Should the manpage be reworded to avoid stating "will force bash to 
read..."? Or should passing a non existing file to `--rcfile` turned 
into an error?


Regards,

--
Gioele Barabucci