Paul Eggert wrote: > Come to think of it, isn't there a different problem with > mbs_startswith? As I recall, mbiter supports GB18030, which has the > unfortunate property that an indivisible sequence of encoding bytes > stands for two Unicode characters
No, that unfortunate property (that particularly affects the mbrtoc32 function and was the reason why we introduced the 'mbrtoc32-regular' module). It is a problem with the BIG5-HKSCS encoding, not with the GB18030 encoding. (Recall that GB18030 is more-or-less a Unicode transformation format like UTF-8, UTF-16, UTF-32.) See https://www.gnu.org/software/gnulib/manual/html_node/mbrtoc32.html > which means that mbiter needs to > parse the sequence and remember the second character while delivering > the first mbiter and friends support both the presence and the absence of the 'mbrtoc32-regular' module. Look for the GNULIB_MBRTOC32_REGULAR tests in the code... Bruno