On Sat, Jul 1, 2023 at 7:35 AM Bruno Haible <br...@clisp.org> wrote:
>
> Here is a proposed patch to overcome the wchar_t limitation in the 'dfa'
> module.
>
> Jim: The background is explained in
> <https://www.gnu.org/software/gnulib/manual/html_node/Strings-and-Characters.html>
> The plan was exposed in
>     <https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00118.html>
> and <https://lists.gnu.org/archive/html/bug-gnulib/2023-06/msg00102.html>
>
> The 'grep' code needs a minimal change, accordingly. (Attached.)
>
> I have verified that this change does not cause test failures in 'grep'
> and in 'sed', on glibc systems, FreeBSD, and Solaris 11.4.
>
> Arnold: I have added '#if GAWK' conditionals, knowing that gawk's build system
> does not use gnulib-tool and you therefore pull from gnulib manually. This
> means the improvements will not land in gawk, since dfa in gawk will continue
> to use wchar_t.
>
> Objections?
>
>
> 2023-07-01  Bruno Haible  <br...@clisp.org>
>
>         dfa: Overcome wchar_t limitations.
>         * lib/localeinfo.h: Include <uchar.h>. Add special definitions for 
> GAWK.
>         (case_folded_counterparts): Change array element type to char32_t.
>         * lib/localeinfo.c: Include <uchar.h>. Add special definitions for 
> GAWK.
>         (is_using_utf8, init_localeinfo): Use mbrtoc32 instead of mbrtowc.
>         (lonesome_lower): Change element type to 'unsigned short'.
>         (case_folded_counterparts): Change array element type to char32_t. Use
>         c32toupper instead of towupper. Use c32tolower instead of towlower.
>         * lib/dfa.c: Include <uchar.h>. Add special definitions for GAWK.
>         (struct mb_char_classes): Change element type of 'chars' to char32_t.
>         (mbs_to_wchar): Use mbrtoc32 instead of mbrtowc.
>         (setbit_wc): Change type of first argument to char32_t. Use c32tob
>         instead of wctob.
>         (parse_bracket_exp): Update.
>         (lex): Use c32isprint instead of iswprint. Use c32isspace instead of
>         iswspace. Use c32rtomb instead of a %lc directive.
>         (addtok_wc): Use c32rtomb instead of wcrtomb.
>         (atom): Update.
>         * modules/dfa (Depends-on): Remove wctype-h. Add uchar, mbrtoc32,
>         c32rtomb, c32tob, c32tolower, c32toupper, c32isprint, c32isspace.
>         (Link): Add $(LIBUNISTRING) $(LIBC32CONV).
>         * modules/dfa-tests (Makefile.am): Link test-dfa-match-aux with
>         $(LIBUNISTRING) $(LIBC32CONV).
>         * NEWS: Mention the change.

Hi Bruno,
Thanks for porting those dfa improvements.
They all look fine to me.

Reply via email to