Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)
This one is already reported on coreutils: http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574 The problem is with numbers higher than /0377; echo and printf consumes all 3 numbers, but it is not 8-bit number. For example: $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610' Should output: 10 10 10 10 instead of � � � � So, if the first octal digit is > 3, use up to 2 digits. Patch follows for echo and printf. Is anything else counting octal values? RR --- diff -up bash-4.1/builtins/printf.def.octal bash-4.1/builtins/printf.def --- bash-4.1/builtins/printf.def.octal 2010-12-07 15:40:24.0 +0100 +++ bash-4.1/builtins/printf.def 2010-12-07 16:13:41.0 +0100 @@ -734,11 +734,15 @@ tescape (estart, cp, sawc) /* The octal escape sequences are `\0' followed by up to three octal digits (if SAWC), or `\' followed by up to three octal digits (if - !SAWC). As an extension, we allow the latter form even if SAWC. */ + !SAWC). As an extension, we allow the latter form even if SAWC. + If the octal character begins with number 4 or higher, + only 2 octal digits fit to byte */ case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': evalue = OCTVALUE (c); - for (temp = 2 + (!evalue && !!sawc); ISOCTAL (*p) && temp--; p++) + for (temp = 2 + (!evalue && !!sawc) - + (!sawc ? c > '3' : evalue ? evalue > 3 : *p > '3'); + ISOCTAL (*p) && temp--; p++) evalue = (evalue * 8) + OCTVALUE (*p); *cp = evalue & 0xFF; break; diff -up bash-4.1/lib/sh/strtrans.c.octal bash-4.1/lib/sh/strtrans.c --- bash-4.1/lib/sh/strtrans.c.octal 2008-08-12 19:49:12.0 +0200 +++ bash-4.1/lib/sh/strtrans.c 2010-12-07 15:40:24.0 +0100 @@ -96,6 +96,8 @@ ansicstr (string, len, flags, sawc, rlen POSIX-2001 requirement and accept 0-3 octal digits after a leading `0'. */ temp = 2 + ((flags & 1) && (c == '0')); + if (*s > '3') + temp--; for (c -= '0'; ISOCTAL (*s) && temp--; s++) c = (c * 8) + OCTVALUE (*s); c &= 0xFF;
Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)
Sorry for wrong indents. Patch in attachment. RR diff -up bash-4.1/builtins/printf.def.octal bash-4.1/builtins/printf.def --- bash-4.1/builtins/printf.def.octal 2010-12-07 15:40:24.0 +0100 +++ bash-4.1/builtins/printf.def2010-12-07 16:13:41.0 +0100 @@ -734,11 +734,15 @@ tescape (estart, cp, sawc) /* The octal escape sequences are `\0' followed by up to three octal digits (if SAWC), or `\' followed by up to three octal digits (if -!SAWC). As an extension, we allow the latter form even if SAWC. */ +!SAWC). As an extension, we allow the latter form even if SAWC. + If the octal character begins with number 4 or higher, + only 2 octal digits fit to byte */ case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': evalue = OCTVALUE (c); - for (temp = 2 + (!evalue && !!sawc); ISOCTAL (*p) && temp--; p++) + for (temp = 2 + (!evalue && !!sawc) - + (!sawc ? c > '3' : evalue ? evalue > 3 : *p > '3'); + ISOCTAL (*p) && temp--; p++) evalue = (evalue * 8) + OCTVALUE (*p); *cp = evalue & 0xFF; break; diff -up bash-4.1/lib/sh/strtrans.c.octal bash-4.1/lib/sh/strtrans.c --- bash-4.1/lib/sh/strtrans.c.octal2008-08-12 19:49:12.0 +0200 +++ bash-4.1/lib/sh/strtrans.c 2010-12-07 15:40:24.0 +0100 @@ -96,6 +96,8 @@ ansicstr (string, len, flags, sawc, rlen POSIX-2001 requirement and accept 0-3 octal digits after a leading `0'. */ temp = 2 + ((flags & 1) && (c == '0')); + if (*s > '3') +temp--; for (c -= '0'; ISOCTAL (*s) && temp--; s++) c = (c * 8) + OCTVALUE (*s); c &= 0xFF;
Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)
On 12/7/10 11:12 AM, Roman Rakus wrote: > This one is already reported on coreutils: > http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574 > > The problem is with numbers higher than /0377; echo and printf consumes all > 3 numbers, but it is not 8-bit number. For example: > $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610' > Should output: > 10 > 10 10 10 > instead of > � > � � � No, it shouldn't. This is a terrible idea. All other shells I tested behave as bash does*, bash behaves as Posix specifies, and the bash behavior is how C character constants work. Why would I change this? (*That is, consume up to three octal digits and mask off all but the lower 8 bits of the result.) Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)
[adding the Austin Group] On 12/07/2010 06:19 PM, Chet Ramey wrote: > On 12/7/10 11:12 AM, Roman Rakus wrote: >> This one is already reported on coreutils: >> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574 >> >> The problem is with numbers higher than /0377; echo and printf consumes all >> 3 numbers, but it is not 8-bit number. For example: >> $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610' >> Should output: >> 10 >> 10 10 10 >> instead of >> � >> � � � > > No, it shouldn't. This is a terrible idea. All other shells I tested > behave as bash does*, bash behaves as Posix specifies, and the bash > behavior is how C character constants work. Why would I change this? > > (*That is, consume up to three octal digits and mask off all but the lower > 8 bits of the result.) POSIX states for echo: "\0num Write an 8-bit value that is the zero, one, two, or three-digit octal number num." It does not explicitly say what happens if a three-digit octal number is not an 8-bit value, so it is debatable whether the standard requires at most an 8-bit value (two characters, \0061 followed by 0) or whether the overflow is silently ignored (treated as one character \0210), or some other treatment. The C99 standard states (at least in 6.4.4.4 of the draft N1256 document): "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined." leaving '\610' as an implementation-defined character constant. The Java language specifically requires "\610" to parse as "\061" followed by "0", and this can be a very useful property to rely on in this day and age where 8-bit bytes are prevalent. http://austingroupbugs.net/view.php?id=249 is standardizing $'' in the shell, and also states: "\XXX yields the byte whose value is the octal value XXX (one to three octal digits)" and while it is explicit that $'\xabc' is undefined (as to whether it maps to $'\xab'c or to $'\u0abc' or to something else), it does not have any language talking about what happens when an octal escape does not fit in a byte. Personally, I would love it if octal escapes were required to stop parsing after two digits if the first digit is > 3, but given that C99 leaves it implementation defined, I think we need a POSIX interpretation to resolve the issue. Also, I think this report means that we need to tweak the wording of bug 249 (adding $'') to deal with the case of an octal escape where three octal digits do not fit in 8 bits (either by explicitly declaring it unspecified, as is the case with \x escapes; or by requiring implementation-defined behavior, as in C99; or by requiring explicit end-of-escape after two digits, as in Java). -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature