[adding the Austin Group] On 12/07/2010 06:19 PM, Chet Ramey wrote: > On 12/7/10 11:12 AM, Roman Rakus wrote: >> This one is already reported on coreutils: >> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574 >> >> The problem is with numbers higher than /0377; echo and printf consumes all >> 3 numbers, but it is not 8-bit number. For example: >> $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610' >> Should output: >> 10 >> 10 10 10 >> instead of >> � >> � � � > > No, it shouldn't. This is a terrible idea. All other shells I tested > behave as bash does*, bash behaves as Posix specifies, and the bash > behavior is how C character constants work. Why would I change this? > > (*That is, consume up to three octal digits and mask off all but the lower > 8 bits of the result.)
POSIX states for echo: "\0num Write an 8-bit value that is the zero, one, two, or three-digit octal number num." It does not explicitly say what happens if a three-digit octal number is not an 8-bit value, so it is debatable whether the standard requires at most an 8-bit value (two characters, \0061 followed by 0) or whether the overflow is silently ignored (treated as one character \0210), or some other treatment. The C99 standard states (at least in 6.4.4.4 of the draft N1256 document): "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined." leaving '\610' as an implementation-defined character constant. The Java language specifically requires "\610" to parse as "\061" followed by "0", and this can be a very useful property to rely on in this day and age where 8-bit bytes are prevalent. http://austingroupbugs.net/view.php?id=249 is standardizing $'' in the shell, and also states: "\XXX yields the byte whose value is the octal value XXX (one to three octal digits)" and while it is explicit that $'\xabc' is undefined (as to whether it maps to $'\xab'c or to $'\u0abc' or to something else), it does not have any language talking about what happens when an octal escape does not fit in a byte. Personally, I would love it if octal escapes were required to stop parsing after two digits if the first digit is > 3, but given that C99 leaves it implementation defined, I think we need a POSIX interpretation to resolve the issue. Also, I think this report means that we need to tweak the wording of bug 249 (adding $'') to deal with the case of an octal escape where three octal digits do not fit in 8 bits (either by explicitly declaring it unspecified, as is the case with \x escapes; or by requiring implementation-defined behavior, as in C99; or by requiring explicit end-of-escape after two digits, as in Java). -- Eric Blake ebl...@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature