On Nov 23, 2011, at 7:09 PM, Chet Ramey wrote: > On 11/23/11 6:54 PM, Matthew Story wrote: >> On Nov 23, 2011, at 4:47 PM, Chet Ramey wrote: >> >>> On 11/23/11 9:03 AM, Matthew Story wrote: >>>> [... snip] > > Yes, sorry. That's what the "bash treats the line read as a C string" > was intended to imply. Since the line read is a C string, the NUL > terminates it and what remains is assigned to the named variables. I > should have used `line' in my explanation instead of `foo'.
I understand that the underlying implementation of the bash builtins is `C', and I understand that `C' stings are NUL terminated. It seems unreasonable to me to expect understanding of this implementation detail when using bash to read streams into variables via the `read' builtin. Further-more, neither the man-page nor the gnu website document this behavior of bash: read read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...] One line is read from the standard input, or from the file descriptor fd supplied as an argument to the -u option, and the first word is assigned to the first name, the second word to the secondname, and so on, with leftover words and their intervening separators assigned to the last name. If there are fewer words read from the input stream than names, the remaining names are assigned empty values. The characters in the value of the IFS variable are used to split the line into words. The backslash character ‘\’ may be used to remove any special meaning for the next character read and for line continuation. If no names are supplied, the line read is assigned to the variable REPLY. The return code is zero, unless end-of-file is encountered, read times out (in which case the return code is greater than 128), or an invalid file descriptor is supplied as the argument to -u. I personally do not read "One line" as meaning "One string of characters terminated either by a null byte or a new-line", I read it as "One string of characters terminated by a new-line". But "One string of characters terminated either by a null byte or a new line" is not the actual functionality. The actual functionality is: "One line is read from the standard input, or from the file descriptor fd supplied as an argument to the -u option, then read byte-wise up to the first contained NUL, or end of string, ..." Furthermore, I do not see the use-case for this behavior ... I simply cannot fathom a case of I/O redirection in shell where I would choose to inject a NUL byte to coerce this sort of behavior from the read builtin, and can't imagine that anyone is relying on this `C string' feature of read currently in bash, especially considering that it is not consistent with NUL handling in other assignments in bash: [matt@matt0 ~]$ foo=`printf 'foo\0bar'`; echo "$foo" | od -a 0000000 f o o b a r nl 0000007 [bash ~]$ foo=$(printf 'foo\0bar'); echo "$foo" | od -a 0000000 f o o b a r nl 0000007 which strip NUL. I see one of three possible resolutions here: 1. NUL bytes do not terminate variable assignment from `read', behavior of echo/variable assignments persists as is 2. NUL bytes are stripped by read on assignment, and this functionality is documented as expected. 3. the existing functionality of the system is documented in the man-page and on gnu.org as expected I would prefer the first, and would be happy to attempt in providing a patch, if that's useful. cheers, -matt > > Chet > -- > ``The lyf so short, the craft so long to lerne.'' - Chaucer > ``Ars longa, vita brevis'' - Hippocrates > Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/ Additional Notes: The only occurrence of the pattern `NUL' in the FreeBSD man-page for bash is: Pattern Matching Any character that appears in a pattern, other than the special pattern characters described below, matches itself. The NUL character may not occur in a pattern. A backslash escapes the following character; the escaping backslash is discarded when matching. The special pattern characters must be quoted if they are to be matched literally. All other references in the man-page are to the null string (empty string) not to an explicit NUL byte (e.g. ascii 0), the same is true of the gnu.org documentation.