Re: regcomp gnulib - glibc sync bears fruit

Bruno Haible Tue, 05 Jan 2010 15:20:19 -0800

Hi Paolo,

> Before proceeding, however, I'm curious whether using nl_langinfo 
> (CODESET) is less precise than locale_charset on some platform.  Bruno?


Here's my reply to Jim from yesterday. For some reason it was apparently
not distributed to the mailing list.

Hi Jim,

> @@ -893,7 +896,9 @@ init_dfa (re_dfa_t *dfa, size_t pat_len)
> dfa->map_notascii = (_NL_CURRENT_WORD (LC_CTYPE, _NL_CTYPE_MAP_TO_NONASCII) 
> != 0);
> #else
> - if (strcmp (locale_charset (), "UTF-8") == 0)
> + codeset_name = nl_langinfo (CODESET);
> + if (strcasecmp (codeset_name, "UTF-8") == 0
> + || strcasecmp (codeset_name, "UTF8") == 0)
> dfa->is_utf8 = 1;
>
> /* We check exhaustively in the loop below if this charset is a

This patch is not wrong: It takes care of the fact that the result
of nl_langinfo(CODESET) can be in upper case or in lower case,
depending on the system, and that on HP-UX, "utf8" is returned
(see lib/config.charset).

But I would nevertheless not apply it nor recommend it, because
the nl_langinfo module may include a lot more stuff in the future:
  - It may include real localizations of the values, instead of returning
    English dummy values. I have already written the converter from
    glibc locale data to PO files that can be read by the nl_langinfo
    replacement.
  - It may include an emulation of the NL_LOCALE_NAME(category)
    macro that works since glibc 2.11.1. This emulation would rely on
    the 'localename' module.
When all you need is nl_langinfo(CODESET), the full-blown 'nl_langinfo'
module is too heavyweight. 'localcharset' is not a POSIX API, but fits
better due to the gnulib module structure.

Bruno

Re: regcomp gnulib - glibc sync bears fruit

Reply via email to