Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)

2010-12-07 Thread Roman Rakus

This one is already reported on coreutils:
http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574

The problem is with numbers higher than /0377; echo and printf consumes 
all 3 numbers, but it is not 8-bit number. For example:

$ echo -e '\0610'; printf '\610 %b\n' '\610 \0610'
Should output:
10
10 10 10
instead of
�
� � �

So, if the first octal digit is > 3, use up to 2 digits.

Patch follows for echo and printf. Is anything else counting octal values?
RR
---
diff -up bash-4.1/builtins/printf.def.octal bash-4.1/builtins/printf.def
--- bash-4.1/builtins/printf.def.octal 2010-12-07 15:40:24.0 +0100
+++ bash-4.1/builtins/printf.def 2010-12-07 16:13:41.0 +0100
@@ -734,11 +734,15 @@ tescape (estart, cp, sawc)

/* The octal escape sequences are `\0' followed by up to three octal
digits (if SAWC), or `\' followed by up to three octal digits (if
- !SAWC). As an extension, we allow the latter form even if SAWC. */
+ !SAWC). As an extension, we allow the latter form even if SAWC.
+ If the octal character begins with number 4 or higher,
+ only 2 octal digits fit to byte */
case '0': case '1': case '2': case '3':
case '4': case '5': case '6': case '7':
evalue = OCTVALUE (c);
- for (temp = 2 + (!evalue && !!sawc); ISOCTAL (*p) && temp--; p++)
+ for (temp = 2 + (!evalue && !!sawc) -
+ (!sawc ? c > '3' : evalue ? evalue > 3 : *p > '3');
+ ISOCTAL (*p) && temp--; p++)
evalue = (evalue * 8) + OCTVALUE (*p);
*cp = evalue & 0xFF;
break;
diff -up bash-4.1/lib/sh/strtrans.c.octal bash-4.1/lib/sh/strtrans.c
--- bash-4.1/lib/sh/strtrans.c.octal 2008-08-12 19:49:12.0 +0200
+++ bash-4.1/lib/sh/strtrans.c 2010-12-07 15:40:24.0 +0100
@@ -96,6 +96,8 @@ ansicstr (string, len, flags, sawc, rlen
POSIX-2001 requirement and accept 0-3 octal digits after
a leading `0'. */
temp = 2 + ((flags & 1) && (c == '0'));
+ if (*s > '3')
+ temp--;
for (c -= '0'; ISOCTAL (*s) && temp--; s++)
c = (c * 8) + OCTVALUE (*s);
c &= 0xFF;




Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)

2010-12-07 Thread Roman Rakus

Sorry for wrong indents. Patch in attachment.

RR
diff -up bash-4.1/builtins/printf.def.octal bash-4.1/builtins/printf.def
--- bash-4.1/builtins/printf.def.octal  2010-12-07 15:40:24.0 +0100
+++ bash-4.1/builtins/printf.def2010-12-07 16:13:41.0 +0100
@@ -734,11 +734,15 @@ tescape (estart, cp, sawc)
 
   /* The octal escape sequences are `\0' followed by up to three octal
 digits (if SAWC), or `\' followed by up to three octal digits (if
-!SAWC).  As an extension, we allow the latter form even if SAWC. */
+!SAWC).  As an extension, we allow the latter form even if SAWC.
+ If the octal character begins with number 4 or higher,
+ only 2 octal digits fit to byte */
   case '0': case '1': case '2': case '3':
   case '4': case '5': case '6': case '7':
evalue = OCTVALUE (c);
-   for (temp = 2 + (!evalue && !!sawc); ISOCTAL (*p) && temp--; p++)
+   for (temp = 2 + (!evalue && !!sawc) -
+ (!sawc ? c > '3' : evalue ? evalue > 3 : *p > '3');
+ ISOCTAL (*p) && temp--; p++)
  evalue = (evalue * 8) + OCTVALUE (*p);
*cp = evalue & 0xFF;
break;
diff -up bash-4.1/lib/sh/strtrans.c.octal bash-4.1/lib/sh/strtrans.c
--- bash-4.1/lib/sh/strtrans.c.octal2008-08-12 19:49:12.0 +0200
+++ bash-4.1/lib/sh/strtrans.c  2010-12-07 15:40:24.0 +0100
@@ -96,6 +96,8 @@ ansicstr (string, len, flags, sawc, rlen
 POSIX-2001 requirement and accept 0-3 octal digits after
 a leading `0'. */
  temp = 2 + ((flags & 1) && (c == '0'));
+  if (*s > '3')
+temp--;
  for (c -= '0'; ISOCTAL (*s) && temp--; s++)
c = (c * 8) + OCTVALUE (*s);
  c &= 0xFF;


Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)

2010-12-07 Thread Chet Ramey
On 12/7/10 11:12 AM, Roman Rakus wrote:
> This one is already reported on coreutils:
> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574
> 
> The problem is with numbers higher than /0377; echo and printf consumes all
> 3 numbers, but it is not 8-bit number. For example:
> $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610'
> Should output:
> 10
> 10 10 10
> instead of
> �
> � � �

No, it shouldn't.  This is a terrible idea.  All other shells I tested
behave as bash does*, bash behaves as Posix specifies, and the bash
behavior is how C character constants work.  Why would I change this?

(*That is, consume up to three octal digits and mask off all but the lower
8 bits of the result.)

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)

2010-12-07 Thread Eric Blake
[adding the Austin Group]

On 12/07/2010 06:19 PM, Chet Ramey wrote:
> On 12/7/10 11:12 AM, Roman Rakus wrote:
>> This one is already reported on coreutils:
>> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574
>>
>> The problem is with numbers higher than /0377; echo and printf consumes all
>> 3 numbers, but it is not 8-bit number. For example:
>> $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610'
>> Should output:
>> 10
>> 10 10 10
>> instead of
>> �
>> � � �
> 
> No, it shouldn't.  This is a terrible idea.  All other shells I tested
> behave as bash does*, bash behaves as Posix specifies, and the bash
> behavior is how C character constants work.  Why would I change this?
> 
> (*That is, consume up to three octal digits and mask off all but the lower
> 8 bits of the result.)

POSIX states for echo:

"\0num Write an 8-bit value that is the zero, one, two, or three-digit
octal number num."

It does not explicitly say what happens if a three-digit octal number is
not an 8-bit value, so it is debatable whether the standard requires at
most an 8-bit value (two characters, \0061 followed by 0) or whether the
overflow is silently ignored (treated as one character \0210), or some
other treatment.

The C99 standard states (at least in 6.4.4.4 of the draft N1256 document):

"The value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined."

leaving '\610' as an implementation-defined character constant.

The Java language specifically requires "\610" to parse as "\061"
followed by "0", and this can be a very useful property to rely on in
this day and age where 8-bit bytes are prevalent.

http://austingroupbugs.net/view.php?id=249 is standardizing $'' in the
shell, and also states:

"\XXX yields the byte whose value is the octal value XXX (one to three
octal digits)"

and while it is explicit that $'\xabc' is undefined (as to whether it
maps to $'\xab'c or to $'\u0abc' or to something else), it does not have
any language talking about what happens when an octal escape does not
fit in a byte.

Personally, I would love it if octal escapes were required to stop
parsing after two digits if the first digit is > 3, but given that C99
leaves it implementation defined, I think we need a POSIX interpretation
to resolve the issue.  Also, I think this report means that we need to
tweak the wording of bug 249 (adding $'') to deal with the case of an
octal escape where three octal digits do not fit in 8 bits (either by
explicitly declaring it unspecified, as is the case with \x escapes; or
by requiring implementation-defined behavior, as in C99; or by requiring
explicit end-of-escape after two digits, as in Java).

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature