Improper UTF-8 combining character handling

Sean Burke Sun, 10 Jun 2007 11:39:14 -0700

Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: linux-gnu
Compiler: i686-pc-linux-gnu-gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
-DCONF_OSTYPE='linu
x-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' -DCONF_VENDOR='pc'
-DLOCALEDIR='/usr/
share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I.  -I.
-I./include -I
./lib   -O2 -march=prescott -fomit-frame-pointer -pipe
uname output: Linux morrigan 2.6.20-gentoo-r8-mactel #4 SMP PREEMPT Sat
May 12 1
0:35:03 MDT 2007 i686 Genuine Intel(R) CPU            1400  @ 1.83GHz
GenuineInt
el GNU/Linux
Machine Type: i686-pc-linux-gnu


Bash Version: 3.2
Patch Level: 15
Release Status: release

Description:
        When using a UTF-8 combining character sequence, there is a
disparity be
tween what is considered a character for display and for editing. The
entire seq
uence will be treated as a single character for the purpose of editing,
but each
 glyph that is part of the sequence is treated separately. This causes
some glyp
hs to not be removed when deleting characters or for the cursor to be
visually i
n the wrong place.

Repeat-By:
        The Unicode normalization test data at
http://www.unicode.org/Public/UNI
DATA/NormalizationTest.txt contains many sequences of this sort. The
first chara
cter sequence, LATIN CAPITAL LETTER D WITH DOT ABOVE, does produce this
problem.
 Paste it into the commandline, then backspace through it. The problem
should be
 reproduced immediately.

Fix:
        Glyphs and character sequences should be treated consistently.
With comb
ining character sequences, it would most likely to be preferable to
treat each c
haracter in the sequence separately to allow for more precise editing,
though th
ere may be other issues I'm unaware of.


_______________________________________________
Bug-bash mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-bash

Improper UTF-8 combining character handling

Reply via email to