GNU Bash 4.4 Test Discrepancy on OpenVMS

2016-10-07 Thread Eric W. Robertson
While building and testing GNU Bash 4.4 on OpenVMS, the GNU Bash test 
script issued the following difference between OpenVMS Bash produced 
output and reference output for the test sub-script tests/exp8.sub 
(lines 28 - 31)


unset array
declare -A array
array=( [$'x\001y\177z']=$'a\242b\002c' )
echo ${array[@]@A}

Currently, the reference result expected for ALL platform 
implementations for the above sequence of Bash test commands is embodied 
in tests/exp.right (line 236):


declare -A array=([$'x\001y\177z']=$'a\242b\002c' )

on OpenVMS the following output is generated instead:

declare -A array=([$'x\001y\177z']=$'a¢b\002c' )

After studying the applicable sections of the relevant ISO and POSIX 
standards and inspection of Bash's execution within the OpenVMS 
Debugger, I have come to the conclusion that this difference arises out 
of an implementation dependent difference with respect to the locale 
dependent characteristics of characters in the C/POSIX locale. The 
relevant ISO and POSIX standards explicitly DO NOT specify any 
particular requirements of the C/POSIX locale regarding locale dependent 
characteristics for character codes outside of the Portable Character 
Set (PCS). Therefore, any programmed behavior relying on locale 
dependent characteristics is subject to implementation differences with 
respect to character codes in the context of the C/POSIX locale lying 
outside of PCS. Using the OpenVMS Debugger, it became apparent that the 
expansion of the shell variable "array" ultimately results in a call to 
the function ansic_quote() (located within source module 
lib/sh/strtrans.c). The relevant excerpt from this function is:


   for (s = str; c = *s; s++)
{
  b = l = 1;/* 1 == add backslash; 0 == no backslash */
  clen = 1;

  switch (c)
{
case ESC: c = 'E'; break;
#ifdef __STDC__
case '\a': c = 'a'; break;
case '\v': c = 'v'; break;
#else
case 0x07: c = 'a'; break;
case 0x0b: c = 'v'; break;
#endif

case '\b': c = 'b'; break;
case '\f': c = 'f'; break;
case '\n': c = 'n'; break;
case '\r': c = 'r'; break;
case '\t': c = 't'; break;
case '\\':
case '\'':
  break;
default:
#if defined (HANDLE_MULTIBYTE)
  b = is_basic (c);
  /* XXX - clen comparison to 0 is dicey */
  if ((b == 0 && ((clen = mbrtowc (&wc, s, MB_CUR_MAX, 0)) < 0 || 
MB_INVALIDCH (clen) || iswprint (wc) == 0)) ||

  (b == 1 && ISPRINT (c) == 0))
#else
  if (ISPRINT (c) == 0)
#endif
{
  *r++ = '\\';
  *r++ = TOCHAR ((c >> 6) & 07);
  *r++ = TOCHAR ((c >> 3) & 07);
  *r++ = TOCHAR (c & 07);
  continue;
}
  l = 0;
  break;
}
  if (b == 0 && clen == 0)
break;

  if (l)
*r++ = '\\';

  if (clen == 1)
*r++ = c;
  else
{
  for (b = 0; b < (int)clen; b++)
*r++ = (unsigned char)s[b];
  s += clen - 1;/* -1 because of the increment above */
}
}

In the case of the Bash build for OpenVMS, the macro HANDLE_MULTIBYTE is 
defined by the Bash configure script. That being the case, it is 
apparent from the above code excerpt that the decision to quote or not 
to quote a particular character code in the expanded string is 
determined by the results of the functions is_basic(), 
mbrtowc(),iswprint(), and isprint() (indirectly through macro expansion 
of the ISPRINT() function macro). The is_basic() function seems to be 
coded in such a way that it it will return homogoneous results across 
platform implementations. However, the results for all of the other, 
remaining functions are locale dependent. Therefore, for character codes 
outside of PCS, the ANSI C quoting of the expanded string is ultimately 
implementation dependent.


Since the octal character code 242 that is used in defining the value 
for the "array" shell variable is clearly outside of PCS, the result of 
expanding the shell variable value in this case cannot be guaranteed to 
be homogoneous for all platform implementations. But, that is currently 
the way both the test script and the reference results are posed.


This naturally prompts a couple of questions: Is this in fact a bug? 
Further, if it is a bug, precisely where is the bug? Given what I know 
at the moment, my own answer to these questions is that if it is a bug, 
the bug is in the test script and its corresponding reference results 
which are not posed to handle platform implementation differences which 
applicable standards explicitly permit in the context of the C/POSIX 
locale and character codes outside of PCS. However, I cannot be entirely 
certain of this conclusion because the exp8.sub script does not contain 
explicit commentary on what the precise motivation is behind the above 
sequence of Bash test commands and what particular significance (if any) 
the octal character code 242 is supposed to have relative to the goal of 
this particular sequence of Bash test commands.

Re: GNU Bash 4.4 Test Discrepancy on OpenVMS

2016-10-12 Thread Eric W. Robertson

Chet,

OK. No worries then. Thanks for the prompt reply and the clarification 
regarding when isascii() is actually needed.


Regards,

Eric

On 10/10/2016 4:01 PM, Chet Ramey wrote:

On 10/7/16 12:54 PM, Eric W. Robertson wrote:

While building and testing GNU Bash 4.4 on OpenVMS, the GNU Bash test
script issued the following difference between OpenVMS Bash produced output
and reference output for the test sub-script tests/exp8.sub (lines 28 - 31)

unset array
declare -A array
array=( [$'x\001y\177z']=$'a\242b\002c' )
echo ${array[@]@A}

Currently, the reference result expected for ALL platform implementations
for the above sequence of Bash test commands is embodied in tests/exp.right
(line 236):

declare -A array=([$'x\001y\177z']=$'a\242b\002c' )

on OpenVMS the following output is generated instead:

declare -A array=([$'x\001y\177z']=$'a¢b\002c' )

After studying the applicable sections of the relevant ISO and POSIX
standards and inspection of Bash's execution within the OpenVMS Debugger, I
have come to the conclusion that this difference arises out of an
implementation dependent difference with respect to the locale dependent
characteristics of characters in the C/POSIX locale.

You're right.  VMS happens to have a character mapped to that value in
the default locale (or at least your default locale), and no other
system does.  It's not a test failure; it's just an anomaly.  That value
was `chosen' because it is exactly the test script submitted as a bug
report in the past.



While investigating this test discrepancy with Bash 4.4 on OpenVMS I came
across another potential source code bug relating to the expansion of the
ISPRINT() function macro. The expansion of the ISPRINT() function macro is,
in turn, partially dependent on the expansion of the IN_CTYPE_DOMAIN()
function macro. In the source code module include/chartypes.h, the function
macro IN_CTYPE_DOMAIN() does not seem to be correctly defined for platforms
not providing the isascii() function.

If you're running on a platform that doesn't provide isascii(), and the
STDC_HEADERS define doesn't evaluate to something non-zero (see below, or
look at the comment in chartypes.h), all bets are off.  That
function/define is not optional (obsolescent is not optional); Posix
requires it and it's there on virtually every Unix system.

The constant 1 means you make a bet that the rest of the ctype functions
check that their arguments are valid ascii characters if it matters, and
otherwise it doesn't and you don't need to check it.



Given the normative definition of the
isascii() function in "The Open Group Base Specifications Issue 7 (IEEE Std
1003.1-2008) 2016 Edition", the current definition of the IN_CTYPE_DOMAIN()
function macro (as the literal constant expression 1) is unlikely to result
in any close approximation of correct behavior for most platforms not
implementing the isascii() function. Instead, I believe the
IN_CTYPE_DOMAIN() function macro would be better defined as follows:

Not quite.  The STDC_HEADERS define, if set, means that you don't have to
guard references to the ctype macros with checks using isascii().  You'd
be better off, if you wanted to really do it, with something like:

#if !defined (isascii) && !HAVE_ISASCII
#  define isascii(x)((x) >= 0 && (x) <= 127)  /* basic */
#endif

and leave the rest of the code intact.