Package: sed Version: 4.1.5-1 Severity: normal *** Please type your report below this line ***
First of all, note: I see no any acknowledgements on utf- and i18n/l10n- readiness of this sed version in man or info pages. Nevertheless, it looks like it is. Commands like: sed 's/c/\U&/g' sed 's/c/\U&\E/g' sed 's/c/\u&/g' sed 's/c/\u&\E/g' occasionally produce broken output when used 'c' char is char (or sequence of chars) from non-ASCII national alphabet (Russian in my case). Examples: # First, I make easiest test file with data. # Literal is russian analog for english "da d". # $ echo -n "da d" > sed.in # english $ echo -n "да д" > sed.in # There are seven octets in dump: pair for each russian utf8 char # plus fifth octet is ASCII space. $ hexdump -C sed.in 00000000 d0 b4 d0 b0 20 d0 b4 \ |.... ..| # Now, sed is started with command, whoes english analog is: # sed 's/d/\u&/g' < sed.in > sed.out # $ sed 's/d/\u&/g' < sed.in > sed.out ; echo $? # english $ sed 's/д/\u&/g' < sed.in > sed.out ; echo $? 0 # Now in dump we can see: first lowercase letter (it matches) # has been gracefully converted to uppercase; second letter # doesn't match and stay as is; then space; then something # crazy (7 octets) in the place where third letter (4th # char in literal) was... $ hexdump -C sed.out 00000000 d0 94 d0 b0 20 fc 88 81 99 b5 82 b4 \ |...........| # And a bit another result for \U: # $ sed 's/d/\U&/g' < sed.in > sed.out # english $ sed 's/д/\U&/g' < sed.in > sed.out # Now the last character is uppercased as it should be, # but before it something strange (6 octets) inserted. $ hexdump -C sed.out 00000000 d0 94 d0 b0 20 fc 88 81 99 b5 82 d0 94 \ |.... ........| It seems, sed s///g 'dislikes' mixed input of one-octet and two-octet chars like Russian chars and spaces or newlines, or like mix of Russian and ASCII chars. Long line of Russian chars without spaces seems to be upper/lower-cased okey, as expected. -- System Information: Debian Release: 4.0 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.18.test.001 Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8) Versions of packages sed depends on: ii libc6 2.3.6.ds1-13etch7 GNU C Library: Shared libraries sed recommends no packages. -- no debconf information