Configuration Information [Automatically generated, do not change]: Machine: i386 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i386' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i386-pc-linux-gnu' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I../bash -I../bash/include -I../bash/lib -g -O2 uname output: Linux tazzelwurm 2.6.11hcz1 #2 Fri Mar 11 20:01:21 CET 2005 i686 GNU/Linux Machine Type: i386-pc-linux-gnu
Bash Version: 3.0 Patch Level: 16 Release Status: release Description: If a string contains an invalid utf8 sequence, its size is reported by ${#var} as the number of characters from start up to the character preceding it. This way you can construct a string which is handled as non-empty by "test -n" and "test -z", but is reported by ${#var} as having zero size. Repeat-By: x=$'\xff'foobar LC_ALL=C echo ${#x} # reports: 7 LC_ALL=en_US.utf-8 echo ${#x} # reports: 0 [ -n "$x" ] && echo non-empty # echoes: non-empty x=baz$'\xff'foobar LC_ALL=en_US.utf-8 echo ${#x} # reports: 3 Fix: I understand that - strictly speaking - this is undefined behavior, but I'd suggest not stopping to count when an invalid multibyte sequence is encountered, but to count it by its number of bytes (or by 1), since the string is definitely non-empty. Thanks, Heike _______________________________________________ Bug-bash mailing list Bug-bash@gnu.org http://lists.gnu.org/mailman/listinfo/bug-bash