Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-pc-linux-gnu' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I../. -I.././include -I.././lib -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wall uname output: Linux host 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu
Bash Version: 4.3 Patch Level: 30 Release Status: release (Debian unstable amd64) $ LC_ALL=tr_TR.UTF-8 bash -c 'typeset -l a; a=İ; echo $a' | hd 00000000 69 b0 0a |i..| 00000003 $ a=İ LC_ALL=tr_TR.UTF-8 bash -c 'echo ${a,,}' | hd 00000000 69 b0 0a |i..| 00000003 In Turkish locales on a GNU system at least, uppercase i is İ, not I. And lowercase I is ı, not i. İ was properly translated to i, but there's a spurious 0xb0 which probably comes from the original İ $ echo İ | hd 00000000 c4 b0 0a |...| 00000003 The reverse problem: $ a=i LC_ALL=tr_TR.UTF-8 bash -c 'echo ${a^^}' i $ a=I LC_ALL=tr_TR.UTF-8 bash -c 'echo ${a,,}' I $ LC_ALL=tr_TR.UTF-8 bash -c 'typeset -u a; a=ia;echo $a' | hd 00000000 69 41 0a |iA.| 00000003 That affects other characters where the lower/upper case counterpart don't have the same number of bytes in their UTF-8 encoding. Here, in a en_US.UTF-8: $ a=$'\u027D' bash -c 'echo $a ${a^^}' | hd 00000000 c9 bd 20 e2 bd a4 03 0a |.. .....| 00000008 $ a=$'\u027D' zsh -c 'echo $a ${(U)a}' | hd 00000000 c9 bd 20 e2 b1 a4 0a |.. ....| 00000007 (this time, the translated character is *larger*, still there's a spurious 0x03 byte, which this time is not coming from the original character, possibly from the stack). -- Stephane