I recently encountered a bug related to UTF-8 in ksh(1).
While inserting the following sequence, part of my prompt gets mangled:
a<backward-char>ö
With PS1='ksh$ ' I expect the following output:
ksh$ öa
... actual output:
kshöaa
Examining the output buffer when the 'ö' character is inserted shows the
following, piped through hexdump:
00000000 c3 61 08 |.a.|
00000003
0xc3 is the first byte of the 'ö' character and the trailing backspace
(0x08) causes the cursor to move past the incomplete UTF-8 sequence. The
backspace is emitted by the following lines in function x_ins:
$ sed -n 460,464p /usr/src/bin/ksh/emacs.c
if (adj == x_adj_done) {
/* no */
for (cp = xlp; cp > xcp; )
x_bs(*--cp);
}
A solution would be to only emit a backspace if cp[-1] is a UTF-8
continuation byte and cp[-2] a UTF-8 start byte. This removes one of
erroneous backspaces that eats the prompt.
Examining the output buffer when the last byte (0xb6) of 'ö' is
inserted:
00000000 08 c3 b6 61 08 |...a.|
The leading erroneous backspace is caused by the following lines in
function x_zots, introduced in r1.64:
$ sed -n 687,691p bin/ksh/emacs.c
if (str > xbuf && isu8cont(*str)) {
while (str > xbuf && isu8cont(*str))
str--;
x_e_putc('\b');
}
I haven't found any viable solution to not emit the backspace if a
character is prepended, as opposed of appended.
Any ideas on how to solve this issue would be much appreciated.