On Mon, Jun 24, 2024 at 10:50:15 -0600, Rob Gardner wrote: > Description: > When using space or newline as a delimiter with readarray -d, > > elements in the array have the delimiter replaced with NULL, > > which is left embedded in each element of the array.
This isn't possible. Bash doesn't allow the storing of NUL bytes in variables, and further, Unix/Linux doesn't permit passing NUL bytes as command-line arguments to programs. > This > causes incorrect behavior when using array elements as arguments to > sub-processes. (Bash cannot pass a NUL byte as an argument.) > I first noticed the problem when trying to use an array element as > part of an > argument to sed: > readarray -d ' ' x << "A B" > sed -e s/X/${x[0]}/ First point, your readarray command is using the wrong redirection operator. I'm fairly sure you meant to write <<< instead of <<. Using the here-string operator <<<, we can see that the first array element retains the space delimiter (because -t was not used), and the second retains the newline character, which is added by <<<. hobbit:~$ readarray -d ' ' x <<< "A B" hobbit:~$ declare -p x declare -a x=([0]="A " [1]=$'B\n') Second point, your sed command is not using quotes. > This caused sed to complain "unterminated `s' command". The space at the end of x[0] causes word splitting to occur, due to the lack of quotes. The s/X/A part becomes one argument, and the / part becomes a second argument. > Using "read -a" instead of readarray produces correct results. That one uses IFS to separate and trim the input fields. The default IFS contains a space, so none of the array elements contains a space. Therefore, your lack of quoting probably doesn't cause any additional word splitting. > With a simple C program to print out the characters in argv[1], one > can see that a NULL character is left in the argument. Program: > #include <stdio.h> > #include <string.h> > void main(int argc, char *argv[]) > { > int i, n; > if (argc > 1) { > n = strlen(argv[1]); > for (i=0; i<n+2; i++) printf("%d ", argv[1][i]); > } > } I'm not at all clear on what this C program is doing. You're putting a single character/byte on the stack for printf to process using the %d operator, which... expects an integer? And therefore reads more than one byte from the stack? Sorry, it's been ages since I did C. > $ readarray -d ' ' X <<< "A B C" > $ read -d ' ' -a Y <<< "A B C" > $ readarray -td ' ' Z <<< "A B C" > $ ./printarg ${X[0]}A > 65 0 65 $ In this command, ${X[0]} is a capital A plus a space character. You're not using quotes, so ${X[0]}A becomes the two argument words "A" and "A". hobbit:~$ readarray -d ' ' X <<< "A B C" hobbit:~$ declare -p X declare -a X=([0]="A " [1]="B " [2]=$'C\n') hobbit:~$ printf '<%s> ' ${X[0]}A ; echo <A> <A> Your C program appears to look only at the first argument word, "A", and ignores the second word. It takes strlen("A"), which is 1, and adds 2 to it, getting 3. Thus, it loops 3 times, and thus, we see the three numbers it writes to stdout. The argument words are stored internally as NUL-terminated strings, so it's no surprise that the second loop iteration prints a 0. The third loop iteration is printing random garbage from beyond the end of the argument string, unless I'm misreading the situation. > $ ./printarg ${Y[0]}A > 65 65 0 83 $ Here, Y[0] contains "A", so you're passing "AA" as your sole argument. The argument's string length is 2, so you're looping 4 times. The numbers 65 65 0 are from the internal storage of the argument words, and the 83 is garbage from beyond the end of the string. > $ ./printarg ${Z[0]}A > 65 65 0 83 $ Here, Z[0] is "A" instead of "A ", because you used -t to trim the space. So you're passing "AA" as your argument, just like the previous call. So, in a nutshell, this is what I believe you need to see: 1) readarray without -t retains the delimiter, even if it's a space or newline. It does not convert the delimiter to a NUL byte. 2) Unquoted ${X[0]} when X[0] ends with a space causes word splitting to occur, so anything after the ${X[0]} will become a new word (assuming IFS hasn't been modified). 3) Arguments passed to a program via the Unix kernel are NUL-terminated strings. Therefore, the NUL byte can't be part of the argument itself. It's a signpost that the argument string has ended.