On Sun, 2018-05-20 at 04:56 +0200, Garreau, Alexandre wrote: > On 2015-11-13 at 07:17, Greg Wooledge wrote: > Actually in the most general case, where those output streams may > contain NUL bytes, it requires two temp files, because you can't store > arbitrary data streams in bash variables at all. > > Why do bash variables use 0-terminated arrays instead of arrays structure > with a length attribute? >
This is a question that interests me for various reasons: I don't really favor the idea that a shell shouldn't be considered a "real" programming language or be held to that kind of standard. Though it is difficult to reconcile that with backward-compatibility and POSIX-compatibility a lot of the time. Apart from the reasons already given: shells tend to assume some level of equivalence between facilities the shell language provides, and similar facilities the OS provides. For instance, shell variables are generally assumed to work the same as OS environment variables. These days there are cases where the two diverge (shell variables support arrays and such, while environment variables do not) and so you can't "export" an array variable, for instance. Encoding shell variables as length-prefixed arrays would create another such disparity: the underlying OS mechanisms for environment variables generally assume a NUL terminates an environment variable (for instance execve() or "man 7 environ") - even if the environment could be (mis-?)used to carry data with NUL in it, the program receiving that data would have to follow the same convention for how to use it, or the data would effectively be lost. NUL containment could be provided in shell variables (similar to how shell variables can provide arrays, etc. but can't "export" them) but then there's an additional problem, of what you can do with them. You can't provide a NUL as part of a command-line argument to an external command (because, like the environment, argv[] is by convention assumed to be NUL-terminated and the OS itself may enforce that assumption in some cases) - so you'd be pretty much limited to internal commands and shell functions - creating another disparity. "Disparities" aren't just theoretical problems or aesthetic blemishes, they turn into user frustration and bug reports. (As in "I put a NUL in a variable and it didn't work right") Personally I do think some method of handling arbitrary binary data in the shell would be a welcome addition (and I think zsh provides that - don't remember if ksh does) - it's just hard to resolve against some of the other underlying assumptions of the shell.