(reproduced with bash 4.3 or 4.4 on Debian unstable and Ubuntu 16.04). perl -le "printf q([[ $'\U%X' = $'\U%X' ]] || echo %06X: $'\U%X').\"\n\", \$_,\$_,\$_,\$_ for (1..0xd7FF, 0xE000..0x10FFFF)" | LC_ALL=zh_HK.big5hkscs bash | LC_ALL=C sed -n l
Where the perl command outputs: [[ $'\U1' = $'\U1' ]] || echo 000001: $'\U1' [[ $'\U2' = $'\U2' ]] || echo 000002: $'\U2' [[ $'\U3' = $'\U3' ]] || echo 000003: $'\U3' [[ $'\U4' = $'\U4' ]] || echo 000004: $'\U4' .... for all valid (albeit not necessarily assigned, let alone available in any charset) Unicode codepoints. Gives: 0000CA: $ 0000CB: \\u00CB$ 0000EA: $ 0000EB: \\u00EB$ 00011A: \210\\$ 0003B1: \243\\$ 000436: \310\\$ 003075: \307\\$ 003618: \234\\$ 003661: \215\\$ 0044C0: \226\\$ 004A35: \232\\$ 004AA4: \207\\$ 004E48: \244\\$ 004F62: \312\\$ 004FDE: \253\\$ 005045: \324\\$ 00509C: \330\\$ 00515D: \242\\$ 00529F: \245\\$ 005412: \246\\$ 00542D: \247\\$ 0056ED: \373\\$ 00577C: \251\\$ 0057A5: \316\\$ 00587F: \341\\$ 0058A6: \274\\$ 0058F0: \211\\$ 005A09: \256\\$ 005A16: \321\\$ 005A2B: \230\\$ 005AF9: \345\\$ 005B1E: \351\\$ 005B40: \304\\$ 005C10: \311\\$ 005CA4: \314\\$ 005D24: \261\\$ 005E4B: \335\\$ 005EC4: \264\\$ 0060DD: \325\\$ 006127: \267\\$ 0063CA: \331\\$ 0064FA: \302\\$ 00669D: \272\\$ 0067AF: \254\\$ 0067E6: \317\\$ 0069D9: \342\\$ 006A9D: \375\\$ 006B7F: \252\\$ 006C7B: \313\\$ 006C94: \250\\$ 006D82: \322\\$ 006DDA: \262\\$ 006EDC: \336\\$ 006F7F: \346\\$ 007019: \362\\$ 007035: \364\\$ 00712E: \332\\$ 0071E1: \355\\$ 00727E: \326\\$ 0072D6: \315\\$ 007366: \352\\$ 0073E2: \227\\$ 0073EE: \257\\$ 007435: \265\\$ 00749E: \277\\$ 0075B1: \236\\$ 007667: \240\\$ 007912: \360\\$ 007A1E: \270\\$ 007A40: \275\\$ 007B0B: \216\\$ 007BA4: \343\\$ 007CED: \231\\$ 007D85: \337\\$ 007E37: \301\\$ 007F61: \323\\$ 0080D0: \320\\$ 0080EC: \213\\$ 00812A: \223\\$ 0082D2: \255\\$ 00833B: \333\\$ 00838D: \327\\$ 0084CB: \273\\$ 00850C: \347\\$ 00855A: \217\\$ 00878F: \353\\$ 0087B0: \356\\$ 008A31: \263\\$ 008C79: \260\\$ 008D15: \367\\$ 008D68: \340\\$ 008DDA: \266\\$ 008E0A: \344\\$ 008E7E: \212\\$ 008EA1: \306\\$ 009103: \334\\$ 009140: \363\\$ 009145: \366\\$ 009186: \350\\$ 00923E: \271\\$ 0093AA: \361\\$ 0095B1: \276\\$ 0097B8: \233\\$ 009910: \300\\$ 009924: \354\\$ 0099F9: \357\\$ 009A31: \365\\$ 009ACF: \305\\$ 009AE2: \221\\$ 009AFF: \237\\$ 009C4B: \370\\$ 009C6D: \371\\$ 009EE0: \303\\$ 00FE4F: \241\\$ 0205EB: \224\\$ 020C3A: \376\\$ 023600: \372\\$ 0265AD: \225\\$ 026C21: \222\\$ 0270F8: \374\\$ 02870F: \214\\$ 02913C: \235\\$ 02A014: \220\\$ $ LC_ALL=zh_HK.big5hkscs locale charmap BIG5-HKSCS Most of the problematic characters are the ones ending in 0x5c (which happens to be backslash in ASCII (or in BIG5-HKSCS when standing alone). $ LC_ALL=zh_HK.big5hkscs bash -xc "[[ $'\u3b1' = $'\u3b1' ]]" 2>&1 | sed -n l + [[ \243\\ = \\\243\\ ]]$ Note that bash -xc $'[[ \u3b1 = \u3b1 ]]' also returns false in those locales. There are similar problems for locales using BIG5, GB18030 or GBK charsets. Same with "case" or a=$'\u3b1'; [[ $a = $a ]] or [[ "$a" = "$a" ]] or ${a#"$a"} [ "$a" = "$a" ] is fine. The CA and EA ones do look a lot like a bug in the glibc's locale definition or gconv module (and the CB, EB ones are a consequence of it) $ LC_ALL=zh_HK.big5hkscs bash -xc "[[ $'\uca' = $'\uca' ]]" 2>&1 | sed -n l + [[ '' = \\\210f ]]$ A $'\uanything' following a $'\uca' always yields 0x88 0x66 (which happens to be the BIG5-HKSCS encoding of U+00CA) in bash, zsh and ksh93 (though only for anything >= 0x80 in bash). Those locales are problematic and should be avoided in general. The problem is that they are often *available*, so all those corner cases caused by the fact that some characters contain ASCII ones can be exploited (think of sudo or many sshd deployments letting LC_* variables through for instance). -- Stephane