Re: [PATCH 3/4] dfa: simplify charclass by assuming C99

2019-12-17 Thread Bruno Haible
Paul Eggert wrote: > +typedef uint_fast64_t charclass_word; It's on my TODO list to assume 'long long' everywhere, i.e. get rid of the old HAVE_LONG_LONG stuff, since all compilers (from IRIX 6.5 to MSVC 14) support it. Just haven't gotten to it yet. Bruno

[PATCH 1/4] dfa: tune via xzalloc

2019-12-17 Thread Paul Eggert
* lib/dfa.c (dfaoptimize): Prefer xzalloc to xmalloc + memset. --- ChangeLog | 5 + lib/dfa.c | 10 +++--- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/ChangeLog b/ChangeLog index 7925915f6..ce3f8b2b7 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2019-12-17 Pa

[PATCH 3/4] dfa: simplify charclass by assuming C99

2019-12-17 Thread Paul Eggert
* lib/dfa.c (CHARCLASS_WORD_BITS): Now always 64. (charclass_word): Now always uint_fast64_t. (CHARCLASS_PAIR): Remove. (CHARCLASS_INIT): Take 4 arguments instead of 8. All uses changed. --- ChangeLog | 6 ++ lib/dfa.c | 35 +++ 2 files changed, 17 insertions(

[PATCH 2/4] fts: tune via calloc

2019-12-17 Thread Paul Eggert
* lib/fts.c (fts_open): Prefer calloc to malloc + memset. --- ChangeLog | 3 +++ lib/fts.c | 4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index ce3f8b2b7..f22770294 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,8 @@ 2019-12-17 Paul Eggert +

[PATCH 4/4] dfa: do not match invalid UTF-8

2019-12-17 Thread Paul Eggert
* lib/dfa.c (struct dfa): Grow utf8_anychar_classes member array from 5 to 9 tokens; this is needed due to the changes to add_utf8_anychar. (charclass_index): 2nd arg is now pointer-to-const. (add_utf8_anychar): Match only valid UTF-8 byte sequences instead of allowing overlong encodings or surroga

Re: hard-locale: make multithread-safe

2019-12-17 Thread Paul Eggert
Thanks, this change looks fine to me. I do have a qualm in that coreutils (and I assume others) interpret !hard_locale (LC_COLLATE) as meaning that the locale is unibyte and uses native byte comparison. As I recall on some platforms (macOS maybe?), the C locale uses UTF-8 so this interpretation is

Re: dfa.c badly broken when dropped into gawk

2019-12-17 Thread Paul Eggert
On 12/15/19 10:43 AM, Arnold Robbins wrote: > To reproduce: > > 1. Checkout the gawk repo > 2. Copy gnulib/lib/dfa.[ch] into gawk/support/. > 3. Apply the minimal patch below I looked into that, and the problem was not in Gnulib; it was that your minimal patch's dfasyntax didn't clear its destina

Re: localcharset: optimize code for native Windows.

2019-12-17 Thread Bruno Haible
Eli Zaretskii wrote: > But your optimization drops LC_ALL entirely, which may not be a good > idea, because LC_CTYPE might not be set. All locale categories are set. The initial value is "C" or "POSIX", for each locale category independently. > And what is the rationale for trying to optimize thi

Re: localcharset: optimize code for native Windows.

2019-12-17 Thread Eli Zaretskii
> From: Bruno Haible > Date: Tue, 17 Dec 2019 15:06:50 +0100 > > The localcharset code for native Windows first fetches > setlocale (LC_ALL, NULL), then notices "oh, this is not what we want", > then fetches setlocale (LC_CTYPE, NULL). > > Example: > > LC_CTYPE => "English_United States.1252"

localcharset: fix multithread-safety bug on Windows and OS/2

2019-12-17 Thread Bruno Haible
When using a static buffer as a result buffer for a (supposedly) multithread-safe functions, we must ensure that when different threads produce the same string in the same buffer at the same time, the byte sequence really doesn't change. We don't know whether sprintf guarantees this. For example, s

build issues with gnulib

2019-12-17 Thread Bruno Haible
Hi Martin, In you wrote: > I'm running into some other build issues in the gltests subdirectory though, > in my clang/libc++/ucrt setup These issues fall in three categories: - issues with _GL_CXXALIAS_SYS, - issues with _GL_CXXALIASWARN, - issues with

Re: hard-locale: make multithread-safe

2019-12-17 Thread Tim Rühsen
Hi Bruno, hi gnulib developers, it's a joy to follow the posts on this list - you (all) surprise, impress and inspire me with your code but even more with your detailed explanations / documentations. Thank you so much for your ongoing work !!! [E.g. this post made me check my code for mbtowc/mbr

localcharset: optimize code for native Windows.

2019-12-17 Thread Bruno Haible
The localcharset code for native Windows first fetches setlocale (LC_ALL, NULL), then notices "oh, this is not what we want", then fetches setlocale (LC_CTYPE, NULL). Example: LC_CTYPE => "English_United States.1252" LC_NUMERIC => "French_France.1252" LC_TIME=> "German_Germany.1252" LC_ALL

hard-locale: make multithread-safe

2019-12-17 Thread Bruno Haible
Hi Paul, Here is a proposed patch to make the hard_locale() function multithread-safe. This is needed because our mbrtowc() override relies on hard_locale, and mbrtowc obviously must be multi-thread safe (that's one of its main features, compared to mbtowc). The previous hard_locale code tries to

nl_langinfo: fix multithread-safety bugs

2019-12-17 Thread Bruno Haible
This series of patches makes gnulib's nl_langinfo replacement multithread-safe. The nl_langinfo of the various platforms is already multithread-safe (as shown by the new unit test, which I let run for 30 seconds on each platform) - nothing to fix on this side. 2019-12-17 Bruno Haible

mbsinit: fix compilation error in mingw-w64 7.0 with _UCRT defined

2019-12-17 Thread Bruno Haible
Newer mingw-w64 releases can be used in a mode that converges more closely with recent MSVC libraries. [1][2] In this mode, the gnulib mbsinit override gives a compilation error. [3] This patch fixes it. People who want to use this mingw mode and compile packages from source will need packages (bi

Re: Compile warning with mingw ("__stat64" redefined)

2019-12-17 Thread Bruno Haible
Hello Christian, > this is pretty minor, but I did notice this warning during the compile > of gnulib: > ../../../gnulib/import/glob.c:75: warning: "__stat64" redefined > # define __stat64(fname, buf) stat (fname, buf) > > In file included from /usr/share/mingw-w64/include/sys/stat.h:58, >

[PATCH 1/2] dfa: remove struct lexer_state.cur_mb_len

2019-12-17 Thread Paul Eggert
* lib/dfa.c (struct lexer_state): Remove cur_mb_len member, as it’s not needed and the code is simpler without it. All uses removed. --- ChangeLog | 7 +++ lib/dfa.c | 20 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/ChangeLog b/ChangeLog index 237bfdd

[PATCH 2/2] dfa: remove one dependency on MB_CUR_MAX

2019-12-17 Thread Paul Eggert
* lib/dfa.c (dfamust): No need to refer to MB_CUR_MAX here. --- ChangeLog | 3 +++ lib/dfa.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/ChangeLog b/ChangeLog index acd67f9dc..561b0ba0a 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,8 @@ 2019-12-16 Paul Eggert +