On Sat, Jul 1, 2023 at 7:35 AM Bruno Haible <br...@clisp.org> wrote: > > Here is a proposed patch to overcome the wchar_t limitation in the 'dfa' > module. > > Jim: The background is explained in > <https://www.gnu.org/software/gnulib/manual/html_node/Strings-and-Characters.html> > The plan was exposed in > <https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00118.html> > and <https://lists.gnu.org/archive/html/bug-gnulib/2023-06/msg00102.html> > > The 'grep' code needs a minimal change, accordingly. (Attached.) > > I have verified that this change does not cause test failures in 'grep' > and in 'sed', on glibc systems, FreeBSD, and Solaris 11.4. > > Arnold: I have added '#if GAWK' conditionals, knowing that gawk's build system > does not use gnulib-tool and you therefore pull from gnulib manually. This > means the improvements will not land in gawk, since dfa in gawk will continue > to use wchar_t. > > Objections? > > > 2023-07-01 Bruno Haible <br...@clisp.org> > > dfa: Overcome wchar_t limitations. > * lib/localeinfo.h: Include <uchar.h>. Add special definitions for > GAWK. > (case_folded_counterparts): Change array element type to char32_t. > * lib/localeinfo.c: Include <uchar.h>. Add special definitions for > GAWK. > (is_using_utf8, init_localeinfo): Use mbrtoc32 instead of mbrtowc. > (lonesome_lower): Change element type to 'unsigned short'. > (case_folded_counterparts): Change array element type to char32_t. Use > c32toupper instead of towupper. Use c32tolower instead of towlower. > * lib/dfa.c: Include <uchar.h>. Add special definitions for GAWK. > (struct mb_char_classes): Change element type of 'chars' to char32_t. > (mbs_to_wchar): Use mbrtoc32 instead of mbrtowc. > (setbit_wc): Change type of first argument to char32_t. Use c32tob > instead of wctob. > (parse_bracket_exp): Update. > (lex): Use c32isprint instead of iswprint. Use c32isspace instead of > iswspace. Use c32rtomb instead of a %lc directive. > (addtok_wc): Use c32rtomb instead of wcrtomb. > (atom): Update. > * modules/dfa (Depends-on): Remove wctype-h. Add uchar, mbrtoc32, > c32rtomb, c32tob, c32tolower, c32toupper, c32isprint, c32isspace. > (Link): Add $(LIBUNISTRING) $(LIBC32CONV). > * modules/dfa-tests (Makefile.am): Link test-dfa-match-aux with > $(LIBUNISTRING) $(LIBC32CONV). > * NEWS: Mention the change.
Hi Bruno, Thanks for porting those dfa improvements. They all look fine to me.