incorrect associative array key parsing if they contain closing square bracket
associative array key parsing seems to be incorrect if they contain closing square bracket inside the array=([key]=value) construct. the following testcase : $ declare -A key_full; key_full=(["version[agent]"]=agent.version); echo "${!key_full[@]}" --- results per version. 4.0.33(1)-release (shbot in #bash) : Richlv: version\[agent\] 4.1.7(1)-release : -bash: declare: key_full: cannot convert indexed to associative array 0 4.1.10(1)-release : bash: declare: key_full: cannot convert indexed to associative array 0 4.1.10(2)-release : bash: [version[agent]]=agent.version: bad array subscript according to people in #bash : i get bad array subscript with 4.2 it fails in 4.2.8 -- Rich
builtin printf behaves incorrectly with "c and 'c character-value arguments
$ printf %d\\n \'À -61 (expected 192) This should be 192 regardless of locale on any system where wchar_t values are ISO-10646/Unicode. Bash is incorrectly reading the first byte of the UTF-8 which happens to be -61 when interpreted as signed char; on a Latin-1 based locale it will probably give -63 instead. Both POSIX and common sense are clear that the numeric values resulting from 'c should be the wchar_t value of c and not the value of the first byte of the multibyte character; from the SUSv3 printf(1) documentation: Note that in a locale with multi-byte characters, the value of a character is intended to be the value of the equivalent of the wchar_t representation of the character as described in the System Interfaces volume of IEEE Std 1003.1-2001. Language lawyers could argue that on 'single-byte' locales perhaps the byte value should be used; however, strictly speaking a single-byte locale is simply a special case of a multi-byte one, and sanity should win in any case. Fixing the issue should be easy; asciicode() in builtins/printf.def simply needs to be changed to decode the character with mbrtowc rather than reading the byte (and perhaps also should be renamed...). Rich
Re: builtin printf behaves incorrectly with "c and 'c character-value arguments
On Mon, Nov 05, 2007 at 09:10:29AM -0500, Chet Ramey wrote: > Rich Felker wrote: > > $ printf %d\\n \'À > > -61 > > (expected 192) > > > > This should be 192 regardless of locale on any system where wchar_t > > values are ISO-10646/Unicode. Bash is incorrectly reading the first > > byte of the UTF-8 which happens to be -61 when interpreted as signed > > char; on a Latin-1 based locale it will probably give -63 instead. > > > > Both POSIX and common sense are clear that the numeric values > > resulting from 'c should be the wchar_t value of c and not the value > > of the first byte of the multibyte character; from the SUSv3 printf(1) > > documentation: > > > > Note that in a locale with multi-byte characters, the value of a > > character is intended to be the value of the equivalent of the > > wchar_t representation of the character as described in the > > System Interfaces volume of IEEE Std 1003.1-2001. > > > > Language lawyers could argue that on 'single-byte' locales perhaps the > > byte value should be used; however, strictly speaking a single-byte > > locale is simply a special case of a multi-byte one, and sanity should > > win in any case. > > You're correct that the bash printf should understand multibyte characters > in a multibyte locale, but not that returning a multibyte character when > a user hasn't asked for one by setting the locale is more "sane." I'm not sure what you mean. For a Latin-1 locale there is no difference, but if the locale is a different legacy locale, the wchar_t value (Unicode scalar value on systems with __STDC_ISO_10646__ defined) needs to be returned. If you're doubtful about the intent of the standard, why not file a request for interpretation? Rich
Re: builtin printf behaves incorrectly with "c and 'c character-value arguments
On Mon, Nov 05, 2007 at 10:23:43PM -0500, Chet Ramey wrote: > Rich Felker wrote: > > > I'm not sure what you mean. For a Latin-1 locale there is no > > difference, but if the locale is a different legacy locale, the > > wchar_t value (Unicode scalar value on systems with __STDC_ISO_10646__ > > defined) needs to be returned. If you're doubtful about the intent of > > the standard, why not file a request for interpretation? > > I'm not doubtful about the standard's intent. When the user has not > chosen to use a locale that contains multibyte characters, not only > should bash not second-guess the user by returning a multibyte > character, functions such as mbrtowc or mblen/mbrlen will not return > "multibyte" values (e.g., mbrlen will return `1' and mbrtowc will return > `-61' -- converted to 195, since it's unsigned -- as its wchar value > while converting 1 character in your example). This 195 _is_ its value as a multibyte character in a locale with ISO-8859-1 as its codeset. In such a locale, it's also the value of the byte (interpreted as unsigned). So here it doesn't matter which you use; either is equally correct. Where something different happens is if your locale has a different codeset. For instance, in KOI8-R, there is a character "²" which is placed on a different byte (9B) than in ISO-8859 encodings (B2). But regardless of your locale, $ printf %d\\n \'² should print 179, provided that your system implementation uses the same values for wchar_t regardless of locale. These semantics are useful because they actually tell you something about the identity of the character. But most importantly, it's just illogical for the function to behave differently based on whether MB_CUR_MAX is 1 or something greater than 1, rather than being based on the actual locale encoding. "²" is a "²" in a KOI8-R locale just as much as it is a "²" in a UTF-8 locale. Bash's printf should not treat the KOI8-R locale badly just because all characters happen to fit into one byte. The mbrtowc function will give the correct result for all locales, whether or not they have characters that take multiple bytes to represent; special-casing locales that don't just gives illogical (and non-conformant!) behavior. Rich P.S. For my own usage I'd be plenty happy as long as the bug is fixed in UTF-8 based locales since that's all I ever intend to use. But I maintain that the current behavior is incorrect and nonconformant in other locales as well. If you want a compromise, why not make the correct behavior be dependent on strict posix mode?
bash's fallback implementation of vsnprintf(3) is nonconformant and breaks printf(1)
bash-3.2.0(1)-release built on a system without native asprintf: $ printf %x\\n 0x8000 -8000 %x is an unsigned format specifier; printing it as signed is bogus. I tracked the problem down to the implementation of vsnprintf_internal which bash uses to implement asprintf if it's not present. This function is buggy and apparently not well-tested. A much saner, less bloated, bugfree implementation of vasprintf is as follows: int vasprintf(char **stringp, const char *format, va_list args) { int l = vsnprintf(NULL, 0, format, args); if (l<0) return l; *stringp = xmalloc(l+1); if (!*stringp) return -1; if ((l=vsnprintf(*stringp, l+1, format, args)) < 0) { free(*stringp); *stringp = 0; return l; } return l; } This works (and is free of integer overflows) as long as the host implementation of vsnprintf is conformant; if vsnprintf is broken you'll be needing to replace snprintf anyway, but hopefully doing it with a correct implementation instead of the one that's currently included with bash. Rich
nonconformant behavior for printf(1) (you cannot interpret - as an option char)
$ printf ---%s---\\n test bash: printf: --: invalid option printf: usage: printf [-v var] format [arguments] expected: ---test--- This seems to be the third bug I've found in bash's internal printf(1) which breaks conformance to POSIX. Could you either fix this, or else disable the printf (and possibly other) builtins entirely when bash is running in POSIX/sh mode? It's a source of breakage for real valid scripts! Disabling the builtins manually is not an option for sh scripts since the mechanism to disable them is bash-specific. Rich
echo(1) non-conformant (processing -e and -E)
When running in POSIX/sh mode, bash should either disable the echo builtin or stop giving special treatment to -e and -E. In particular, POSIX provides well-defined behavior for: echo -e bash gives: blank line posix gives: line containing only "-e" echo -E bash gives: blank line posix gives: line containing only "-E" echo -e -n bash gives: no output posix gives: line containing "-e -n" POSIX leaves behavior unspecified when -n is the first argument, and also when any argument contains backslashes. However, if conformance to the XSI part of SUSv3 is also desired, -e must be default. I tend to think this is stupid, which you probably agree with, so I have no opinion on changing it to be XSI-conformant but I'm mentioning it anyway for completeness. Basically, my point is that bash, in POSIX/sh mode, should not provide nonconformant builtins for POSIX commands which prevent potentially conformant ones in the host system's path from being used. Either the bash versions should be conformant or they should get out of the way when in standards-conformant mode. Rich
Re: nonconformant behavior for printf(1) (you cannot interpret - as an option char)
On Mon, Nov 26, 2007 at 10:09:11PM -0700, Eric Blake wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > According to Rich Felker on 11/26/2007 10:02 PM: > > On Mon, Nov 26, 2007 at 09:54:52PM -0700, Eric Blake wrote: > >> -BEGIN PGP SIGNED MESSAGE- > >> Hash: SHA1 > >> > >> Please keep replies on the list, so that others may chime in. > > ^^^ Sorry, will do from now on. > > Printf does not claim conformance to those guidelines; read the > > specific documentation on printf. In fact many utilities do not. You > > have to read the specific documentation on each one. > > You should feel free to take this up with the Austin group, then. This is > not bash's problem, unless you can prove that POSIX intends for printf(1) > to reject the extension of options. > > POSIX is quite clear that echo(1) rejects options with the statement "The > echo utility shall not recognize the "−−" argument in the manner specified > by Guideline 10 > of XBD Section 12.2; "−−" shall be recognized as a string operand." > > For any utility that does not have this explicit rejection, then the > extension of providing options is valid implicitly. Just because a > portable application cannot use those options does not mean that an Every other utility that uses the guidelines explicitly mentions them. Moreover since the guidelines explicitly say that they apply to any utility claiming conformance to them, I think it's clear that they don't apply to a utility whose documentation makes no mention of them. > implementation can provide options; therefore, a portable application MUST > use -- to separate the end of theoretical options from the leading argument. But a portable application cannot do this since it's perfectly valid for an implementation not to support --. Given the mess we have, the only reliable way I see to use a format string beginning with a - is to use \055. And one thing we can probably agree upon is that, due to the prevalence of implementations that treat - specially (whether this is correct or incorrect behavior), changing them now would do little to help the portability of scripts whose authors will want them to work on outdated versions of the shell as well, so this argument is mostly for completeness/correctness sake. > And FWIW, coreutils interprets POSIX in the same manner as bash. GNU coreutils is hardly a model of conformance... > > Again, go read POSIX and if you're still unclear file a RFI. But it's > > very clear and bash is incorrect in this respect. > > I'm on the Austin group, and feel quite confident that I understand what > it permits vs. what it requires. If everyone on the Austin group thought about things exactly the same way, I suspect you guys would have a MUCH easier time. Of course things don't work that way, so why not run it by some of your peers? Even if the behavior you believe is intended is actually what's intended, the specification should be amended to make it explicit to prevent this sort of argument in the future. Rich
Re: nonconformant behavior for printf(1) (you cannot interpret - as an option char)
On Mon, Nov 26, 2007 at 10:24:08PM -0700, Eric Blake wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > According to Eric Blake on 11/26/2007 10:09 PM: > >> Again, go read POSIX and if you're still unclear file a RFI. But it's > >> very clear and bash is incorrect in this respect. > > > > I'm on the Austin group, and feel quite confident that I understand what > > it permits vs. what it requires. > > Furthermore, read the paragraph about OPTIONS in section 1.11 of: > > http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap01.html > > Default Behavior: When this section is listed as "None.", it means that > the implementation need not support any options. Standard utilities that > do not accept options, but that do accept operands, shall recognize "--" > as a first argument to be discarded. > > The requirement for recognizing "--" is because conforming applications > need a way to shield their operands from any arbitrary options that the > implementation may provide as an extension. For example, if the standard > utility foo is listed as taking no options, and the application needed to > give it a pathname with a leading hyphen, it could safely do it as: > > foo -- -myfile > > > Sure enough, the POSIX page for printf(1) lists "None." under OPTIONS, so > what I'm saying is _required_ by POSIX, despite your bogus claims to the > contrary. Okay, thanks for the clarification. I'll drop this complaint and report bugs to any implementations I find where the -- is not accepted. Sorry for wasting your time. Rich
here strings fold newlines on MacOS
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: darwin18.7.0 Compiler: clang Compilation CFLAGS: -DSSH_SOURCE_BASHRC uname output: Darwin flounder.home.mati.ca 18.7.0 Darwin Kernel Version 18.7.0: Tue Nov 10 00:07:31 PST 2020; root:xnu-4903.278.51~1/RELEASE_X86_64 x86_64 Machine Type: x86_64-apple-darwin18.7.0 Bash Version: 5.1 Patch Level: 0 Release Status: release Description: Here strings ('<<<') fold newlines into spaces on MacOS, but not on Linux, leading to incompatibilites in bash code expected to work the same on both platforms. Repeat-By: On Linux: $ od -bc <<< $(echo -e "foo\nbar") 000 146 157 157 012 142 141 162 012 f o o \n b a r \n 010 On MacOS: $ od -bc <<< $(echo -e "foo\nbar") 000 146 157 157 040 142 141 162 012 f o o b a r \n 010
Re: here strings fold newlines on MacOS
Ah, yep — sorry for the false positive. I *have* 5.1.0 installed (and bashbug picked that up!) but the context in which I encountered the bug was indeed Apple’s ancient version. I could’ve sworn I tested it in both, but clearly messed that up - verified it’s fixed in 5.1.0. -Rich On Jan 31, 2021, 1:35 PM -0500, Chet Ramey , wrote: > On 1/30/21 5:44 PM, Rich Lafferty wrote: > > > Bash Version: 5.1 > > Patch Level: 0 > > Release Status: release > > > > Description: > > Here strings ('<<<') fold newlines into spaces on MacOS, but not on Linux, > > leading to incompatibilites in bash code expected to work the same on both > > platforms. > > This was fixed quite a while back, but still after Apple froze its version > of bash at 3.2.57. Maybe you should file a bug report with Apple. > (Just kidding, they don't care.) > > > -- > ``The lyf so short, the craft so long to lerne.'' - Chaucer > ``Ars longa, vita brevis'' - Hippocrates > Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/