On Dec 19 15:41, Corinna Vinschen wrote: > On Dec 19 08:51, Bruno Haible wrote: > > Corinna Vinschen wrote in > > <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>: > > > it would be > > > pretty nice if that code could get reverted back in to support > > > non-BMP charsets even on Cygwin. > > > > I agree that support for beyond-BMP characters should be added back to > > 'grep'. > > > > Your earlier fix from 2013-08-16 (and the fact that the test failure is > > occurring exactly on Windows and AIX platforms) shows that the problem is > > with wchar_t being only 16-bit wide on these platforms. > > > > The type 'char32_t' has been introduced in C11 to overcome this > > limitation.[1] > > > > I propose to > > > > 1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so > > that we can use these instead of <wchar.h>, wchar_t, and mbrtowc > > portably, > > > > 2) change those gnulib modules that don't behave well with beyond-BMP > > characters on Windows and AIX to use char32_t instead of wchar_t. > > > > Then the 'grep' code can be changed in a similar way, and this will > > fix the bug on Cygwin and AIX (though not on native Windows [2]). > > > > The advantage of this approach are minimal code changes in 'grep': just > > change some type and function names here and there, and add code for > > the additional (size_t)(-3) return value of mbrtoc32. > > IIUC this would also drop the requirement for #ifdef CYGWIN'ed code.
... in grep. > Sounds like a great idea to me! > > > Corinna > > > > > > > Bruno > > > > [1] > > https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types > > [2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html