incorrect associative array key parsing if they contain closing square bracket

2011-05-02 Thread Rich
associative array key parsing seems to be incorrect if they contain 
closing square bracket inside the array=([key]=value) construct.


the following testcase :

$ declare -A key_full; key_full=(["version[agent]"]=agent.version); echo 
"${!key_full[@]}"

---
results per version.

 4.0.33(1)-release (shbot in #bash) :

 Richlv: version\[agent\]

 4.1.7(1)-release :

-bash: declare: key_full: cannot convert indexed to associative array
0

 4.1.10(1)-release :

bash: declare: key_full: cannot convert indexed to associative array
0

 4.1.10(2)-release :

bash: [version[agent]]=agent.version: bad array subscript


according to people in #bash :

 i get bad array subscript with 4.2
 it fails in 4.2.8
--
 Rich



builtin printf behaves incorrectly with "c and 'c character-value arguments

2007-11-01 Thread Rich Felker
$ printf %d\\n \'À
-61
(expected 192)

This should be 192 regardless of locale on any system where wchar_t
values are ISO-10646/Unicode. Bash is incorrectly reading the first
byte of the UTF-8 which happens to be -61 when interpreted as signed
char; on a Latin-1 based locale it will probably give -63 instead.

Both POSIX and common sense are clear that the numeric values
resulting from 'c should be the wchar_t value of c and not the value
of the first byte of the multibyte character; from the SUSv3 printf(1)
documentation:

 Note that in a locale with multi-byte characters, the value of a
 character is intended to be the value of the equivalent of the
 wchar_t representation of the character as described in the
 System Interfaces volume of IEEE Std 1003.1-2001.

Language lawyers could argue that on 'single-byte' locales perhaps the
byte value should be used; however, strictly speaking a single-byte
locale is simply a special case of a multi-byte one, and sanity should
win in any case.

Fixing the issue should be easy; asciicode() in builtins/printf.def
simply needs to be changed to decode the character with mbrtowc rather
than reading the byte (and perhaps also should be renamed...).

Rich




Re: builtin printf behaves incorrectly with "c and 'c character-value arguments

2007-11-05 Thread Rich Felker
On Mon, Nov 05, 2007 at 09:10:29AM -0500, Chet Ramey wrote:
> Rich Felker wrote:
> > $ printf %d\\n \'À
> > -61
> > (expected 192)
> > 
> > This should be 192 regardless of locale on any system where wchar_t
> > values are ISO-10646/Unicode. Bash is incorrectly reading the first
> > byte of the UTF-8 which happens to be -61 when interpreted as signed
> > char; on a Latin-1 based locale it will probably give -63 instead.
> > 
> > Both POSIX and common sense are clear that the numeric values
> > resulting from 'c should be the wchar_t value of c and not the value
> > of the first byte of the multibyte character; from the SUSv3 printf(1)
> > documentation:
> > 
> >  Note that in a locale with multi-byte characters, the value of a
> >  character is intended to be the value of the equivalent of the
> >  wchar_t representation of the character as described in the
> >  System Interfaces volume of IEEE Std 1003.1-2001.
> > 
> > Language lawyers could argue that on 'single-byte' locales perhaps the
> > byte value should be used; however, strictly speaking a single-byte
> > locale is simply a special case of a multi-byte one, and sanity should
> > win in any case.
> 
> You're correct that the bash printf should understand multibyte characters
> in a multibyte locale, but not that returning a multibyte character when
> a user hasn't asked for one by setting the locale is more "sane."

I'm not sure what you mean. For a Latin-1 locale there is no
difference, but if the locale is a different legacy locale, the
wchar_t value (Unicode scalar value on systems with __STDC_ISO_10646__
defined) needs to be returned. If you're doubtful about the intent of
the standard, why not file a request for interpretation?

Rich




Re: builtin printf behaves incorrectly with "c and 'c character-value arguments

2007-11-05 Thread Rich Felker
On Mon, Nov 05, 2007 at 10:23:43PM -0500, Chet Ramey wrote:
> Rich Felker wrote:
> 
> > I'm not sure what you mean. For a Latin-1 locale there is no
> > difference, but if the locale is a different legacy locale, the
> > wchar_t value (Unicode scalar value on systems with __STDC_ISO_10646__
> > defined) needs to be returned. If you're doubtful about the intent of
> > the standard, why not file a request for interpretation?
> 
> I'm not doubtful about the standard's intent.  When the user has not
> chosen to use a locale that contains multibyte characters, not only
> should bash not second-guess the user by returning a multibyte
> character, functions such as mbrtowc or mblen/mbrlen will not return
> "multibyte" values (e.g., mbrlen will return `1' and mbrtowc will return
> `-61' -- converted to 195, since it's unsigned -- as its wchar value
> while converting 1 character in your example).

This 195 _is_ its value as a multibyte character in a locale with
ISO-8859-1 as its codeset. In such a locale, it's also the value of
the byte (interpreted as unsigned). So here it doesn't matter which
you use; either is equally correct.

Where something different happens is if your locale has a different
codeset. For instance, in KOI8-R, there is a character "²" which is
placed on a different byte (9B) than in ISO-8859 encodings (B2). But
regardless of your locale,

$ printf %d\\n \'²

should print 179, provided that your system implementation uses the
same values for wchar_t regardless of locale. These semantics are
useful because they actually tell you something about the identity of
the character. But most importantly, it's just illogical for the
function to behave differently based on whether MB_CUR_MAX is 1 or
something greater than 1, rather than being based on the actual locale
encoding. "²" is a "²" in a KOI8-R locale just as much as it is a "²"
in a UTF-8 locale. Bash's printf should not treat the KOI8-R locale
badly just because all characters happen to fit into one byte. The
mbrtowc function will give the correct result for all locales, whether
or not they have characters that take multiple bytes to represent;
special-casing locales that don't just gives illogical (and
non-conformant!) behavior.

Rich


P.S. For my own usage I'd be plenty happy as long as the bug is fixed
in UTF-8 based locales since that's all I ever intend to use. But I
maintain that the current behavior is incorrect and nonconformant in
other locales as well. If you want a compromise, why not make the
correct behavior be dependent on strict posix mode?




bash's fallback implementation of vsnprintf(3) is nonconformant and breaks printf(1)

2007-11-09 Thread Rich Felker
bash-3.2.0(1)-release built on a system without native asprintf:
$ printf %x\\n 0x8000
-8000

%x is an unsigned format specifier; printing it as signed is bogus. I
tracked the problem down to the implementation of vsnprintf_internal
which bash uses to implement asprintf if it's not present. This
function is buggy and apparently not well-tested. A much saner, less
bloated, bugfree implementation of vasprintf is as follows:

int
vasprintf(char **stringp, const char *format, va_list args)
{
  int l = vsnprintf(NULL, 0, format, args);
  if (l<0) return l;
  *stringp = xmalloc(l+1);
  if (!*stringp) return -1;
  if ((l=vsnprintf(*stringp, l+1, format, args)) < 0) {
free(*stringp);
*stringp = 0;
return l;
  }
  return l;
}

This works (and is free of integer overflows) as long as the host
implementation of vsnprintf is conformant; if vsnprintf is broken
you'll be needing to replace snprintf anyway, but hopefully doing it
with a correct implementation instead of the one that's currently
included with bash.

Rich




nonconformant behavior for printf(1) (you cannot interpret - as an option char)

2007-11-26 Thread Rich Felker
$ printf ---%s---\\n test
bash: printf: --: invalid option
printf: usage: printf [-v var] format [arguments]

expected: ---test---

This seems to be the third bug I've found in bash's internal printf(1)
which breaks conformance to POSIX. Could you either fix this, or else
disable the printf (and possibly other) builtins entirely when bash is
running in POSIX/sh mode? It's a source of breakage for real valid
scripts! Disabling the builtins manually is not an option for sh
scripts since the mechanism to disable them is bash-specific.

Rich




echo(1) non-conformant (processing -e and -E)

2007-11-26 Thread Rich Felker
When running in POSIX/sh mode, bash should either disable the echo
builtin or stop giving special treatment to -e and -E. In particular,
POSIX provides well-defined behavior for:

echo -e
bash gives: blank line
posix gives: line containing only "-e"

echo -E
bash gives: blank line
posix gives: line containing only "-E"

echo -e -n
bash gives: no output
posix gives: line containing "-e -n"

POSIX leaves behavior unspecified when -n is the first argument, and
also when any argument contains backslashes. However, if conformance
to the XSI part of SUSv3 is also desired, -e must be default. I tend
to think this is stupid, which you probably agree with, so I have no
opinion on changing it to be XSI-conformant but I'm mentioning it
anyway for completeness.

Basically, my point is that bash, in POSIX/sh mode, should not provide
nonconformant builtins for POSIX commands which prevent potentially
conformant ones in the host system's path from being used. Either the
bash versions should be conformant or they should get out of the way
when in standards-conformant mode.

Rich




Re: nonconformant behavior for printf(1) (you cannot interpret - as an option char)

2007-11-26 Thread Rich Felker
On Mon, Nov 26, 2007 at 10:09:11PM -0700, Eric Blake wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> According to Rich Felker on 11/26/2007 10:02 PM:
> > On Mon, Nov 26, 2007 at 09:54:52PM -0700, Eric Blake wrote:
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA1
> >>
> >> Please keep replies on the list, so that others may chime in.
> 
> ^^^

Sorry, will do from now on.

> > Printf does not claim conformance to those guidelines; read the
> > specific documentation on printf. In fact many utilities do not. You
> > have to read the specific documentation on each one.
> 
> You should feel free to take this up with the Austin group, then.  This is
> not bash's problem, unless you can prove that POSIX intends for printf(1)
> to reject the extension of options.
> 
> POSIX is quite clear that echo(1) rejects options with the statement "The
> echo utility shall not recognize the "−−" argument in the manner specified
> by Guideline 10
> of XBD Section 12.2; "−−" shall be recognized as a string operand."
> 
> For any utility that does not have this explicit rejection, then the
> extension of providing options is valid implicitly.  Just because a
> portable application cannot use those options does not mean that an

Every other utility that uses the guidelines explicitly mentions them.
Moreover since the guidelines explicitly say that they apply to any
utility claiming conformance to them, I think it's clear that they
don't apply to a utility whose documentation makes no mention of them.

> implementation can provide options; therefore, a portable application MUST
> use -- to separate the end of theoretical options from the leading argument.

But a portable application cannot do this since it's perfectly valid
for an implementation not to support --. Given the mess we have, the
only reliable way I see to use a format string beginning with a - is
to use \055. And one thing we can probably agree upon is that, due to
the prevalence of implementations that treat - specially (whether this
is correct or incorrect behavior), changing them now would do little
to help the portability of scripts whose authors will want them to
work on outdated versions of the shell as well, so this argument is
mostly for completeness/correctness sake.

> And FWIW, coreutils interprets POSIX in the same manner as bash.

GNU coreutils is hardly a model of conformance...

> > Again, go read POSIX and if you're still unclear file a RFI. But it's
> > very clear and bash is incorrect in this respect.
> 
> I'm on the Austin group, and feel quite confident that I understand what
> it permits vs. what it requires.

If everyone on the Austin group thought about things exactly the same
way, I suspect you guys would have a MUCH easier time. Of course
things don't work that way, so why not run it by some of your peers?
Even if the behavior you believe is intended is actually what's
intended, the specification should be amended to make it explicit to
prevent this sort of argument in the future.

Rich




Re: nonconformant behavior for printf(1) (you cannot interpret - as an option char)

2007-11-26 Thread Rich Felker
On Mon, Nov 26, 2007 at 10:24:08PM -0700, Eric Blake wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> According to Eric Blake on 11/26/2007 10:09 PM:
> >> Again, go read POSIX and if you're still unclear file a RFI. But it's
> >> very clear and bash is incorrect in this respect.
> > 
> > I'm on the Austin group, and feel quite confident that I understand what
> > it permits vs. what it requires.
> 
> Furthermore, read the paragraph about OPTIONS in section 1.11 of:
> 
> http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap01.html
> 
> Default Behavior: When this section is listed as "None.", it means that
> the implementation need not support any options. Standard utilities that
> do not accept options, but that do accept operands, shall recognize "--"
> as a first argument to be discarded.
> 
> The requirement for recognizing "--" is because conforming applications
> need a way to shield their operands from any arbitrary options that the
> implementation may provide as an extension. For example, if the standard
> utility foo is listed as taking no options, and the application needed to
> give it a pathname with a leading hyphen, it could safely do it as:
> 
> foo -- -myfile
> 
> 
> Sure enough, the POSIX page for printf(1) lists "None." under OPTIONS, so
> what I'm saying is _required_ by POSIX, despite your bogus claims to the
> contrary.

Okay, thanks for the clarification. I'll drop this complaint and
report bugs to any implementations I find where the -- is not
accepted. Sorry for wasting your time.

Rich




here strings fold newlines on MacOS

2021-01-30 Thread Rich Lafferty
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: darwin18.7.0
Compiler: clang
Compilation CFLAGS: -DSSH_SOURCE_BASHRC
uname output: Darwin flounder.home.mati.ca 18.7.0 Darwin Kernel Version 18.7.0: 
Tue Nov 10 00:07:31 PST 2020; root:xnu-4903.278.51~1/RELEASE_X86_64 x86_64
Machine Type: x86_64-apple-darwin18.7.0

Bash Version: 5.1
Patch Level: 0
Release Status: release

Description:
Here strings ('<<<') fold newlines into spaces on MacOS, but not on Linux,
leading to incompatibilites in bash code expected to work the same on both
platforms.

Repeat-By:

On Linux:

$ od -bc <<< $(echo -e "foo\nbar")
000 146 157 157 012 142 141 162 012
 f o o \n b a r \n
010

On MacOS:

$ od -bc <<< $(echo -e "foo\nbar")
000 146 157 157 040 142 141 162 012
 f o o b a r \n
010


Re: here strings fold newlines on MacOS

2021-01-31 Thread Rich Lafferty
Ah, yep — sorry for the false positive. I *have* 5.1.0 installed (and bashbug 
picked that up!) but the context in which I encountered the bug was indeed 
Apple’s ancient version. I could’ve sworn I tested it in both, but clearly 
messed that up - verified it’s fixed in 5.1.0.

  -Rich

On Jan 31, 2021, 1:35 PM -0500, Chet Ramey , wrote:
> On 1/30/21 5:44 PM, Rich Lafferty wrote:
>
> > Bash Version: 5.1
> > Patch Level: 0
> > Release Status: release
> >
> > Description:
> > Here strings ('<<<') fold newlines into spaces on MacOS, but not on Linux,
> > leading to incompatibilites in bash code expected to work the same on both
> > platforms.
>
> This was fixed quite a while back, but still after Apple froze its version
> of bash at 3.2.57. Maybe you should file a bug report with Apple.
> (Just kidding, they don't care.)
>
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
> ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/