reverse-i-search, multibyte backspace problem

2015-07-18 Thread e est
Hello,

I've noticed a bug with terminal usage of bash.

Steps to reproduce:
1. Press control-r to get in reverse-i-search mode
2. Enter a key outside of the ASCII character set, like the french é or the 
german ä.
3. Press backspace.

What to expect:
The key gets removed.

What happens:
Instead of the whole key getting removed, you can see a weird character (like � 
or Ã) appear.

The most likely theory:
Instead of adhering to the utf-8 multibyte specification, and removing the 
whole multibyte codepoint encoding sequence (or perhaps the whole sequence 
representing the "abstract character"? [1]), it just removes the last byte.

Note that the bug is dependent on the terminal. I've originally discovered the 
bug on konsole, but it has been confirmed by other users on the freenode #bash 
channel to exist on xterm, st and rxvt, but one user couldn't reproduce it with 
st.

Affected versions:
I've tested 4.3.30(1)-release (my distro's packaged one) and 4.3.39(2)-release, 
the latter coming straight from the development git repository's master branch, 
compiled with ./configure && make -j 4.
The operating system I use is Kubuntu, but it has been confirmed to exist on 
Gentoo and arch linux too. 

Thanks for answers.

Greetings
Est31.

[1]: Quoting the Unicode standard, version 7,  Section 3.4, Characters and 
Encoding:
"A single abstract character may also be represented by a sequence of code 
points—for example, "latin capital letter g with acute" may be represented by 
the sequence , rather than being mapped to a single code point."



Re: reverse-i-search, multibyte backspace problem

2015-07-18 Thread e est
Hello,

Thanks for pointing out the fix. I've tried the "devel" branch, and couldn't 
reproduce the bug there.

Sorry for the disturbance, I should have checked whether the master branch 
really represents the bleeding edge of development.

19.07.2015, 03:53, "Eduardo A. Bustamante López" :
> Hello,
>
> Can you please try the 'devel' branch?
>
> There's a fix for this issue already in it:
>
> | commit 947f04912e4715e7a9df526cd99412bffa729368
> | Author: Chet Ramey 
> | Date: Tue Jan 27 11:10:49 2015 -0500
> |
> | commit bash-20150116 snapshot
>
> Here's the description of the fix:
>
> | lib/readline/isearch.c
> | - _rl_isearch_dispatch: if we are in a multibyte locale, make sure to use
> | _rl_find_prev_mbchar when trying to delete characters from the search
> | string, instead of just chopping off the previous byte. Fixes bug
> | reported by Kyrylo Shpytsya 
>
> This was reported earlier this year:
>
>   http://lists.gnu.org/archive/html/bug-readline/2015-01/msg00017.html
>
> Or use this to patch:
>
> | dualbus@yaqui ...src/gnu/bash % git diff origin/master 
> 947f04912e4715e7a9df526cd99412bffa729368 -- lib/readline/isearch.c
> | diff --git a/lib/readline/isearch.c b/lib/readline/isearch.c
> | index 6f6a7a6..d768560 100644
> | --- a/lib/readline/isearch.c
> | +++ b/lib/readline/isearch.c
> | @@ -553,8 +553,16 @@ add_character:
> | do until we have a real isearch-undo. */
> | if (cxt->search_string_index == 0)
> | rl_ding ();
> | - else
> | + else if (MB_CUR_MAX == 1 || rl_byte_oriented)
> | cxt->search_string[--cxt->search_string_index] = '\0';
> | + else
> | + {
> | + wstart = _rl_find_prev_mbchar (cxt->search_string, 
> cxt->search_string_index, MB_FIND_NONZERO);
> | + if (wstart >= 0)
> | + cxt->search_string[cxt->search_string_index = wstart] = '\0';
> | + else
> | + rl_ding ();
> | + }
> | break;
> |
> | case -4: /* C-G, abort */
>
> Greetings!
>
> --
> Eduardo Bustamante
> https://dualbus.me/