Paul Eggert wrote:
> Come to think of it, isn't there a different problem with 
> mbs_startswith? As I recall, mbiter supports GB18030, which has the 
> unfortunate property that an indivisible sequence of encoding bytes 
> stands for two Unicode characters

No, that unfortunate property (that particularly affects the mbrtoc32
function and was the reason why we introduced the 'mbrtoc32-regular'
module). It is a problem with the BIG5-HKSCS encoding, not with the
GB18030 encoding. (Recall that GB18030 is more-or-less a Unicode
transformation format like UTF-8, UTF-16, UTF-32.) See
https://www.gnu.org/software/gnulib/manual/html_node/mbrtoc32.html

> which means that mbiter needs to 
> parse the sequence and remember the second character while delivering 
> the first

mbiter and friends support both the presence and the absence of the
'mbrtoc32-regular' module. Look for the GNULIB_MBRTOC32_REGULAR tests
in the code...

Bruno




Reply via email to