> > |> 58c58 > > |> < ptx: Fran?ois Pinard This is not user friendly: proper_name_utf8 should not return a result with question marks. Instead it's better if it returns its first argument. I'm fixing it through the appended patch. But it will not fix the coreutils test failure.
> > |> --- > > |>> ptx: François Pinard > > > > In my email, this is rendering as one vs. two characters. I suspect it > > might be a locale issue - perhaps Jim is using a UTF-8 locale, and Michael > > is using a Latin-1 encoding? Michael must be using a locale in ASCII encoding; if it were a Latin1 encoding, the output would have contained a cedilla, not a question mark. Jim Meyering wrote: > The problem is probably that his system lacks the en_US.UTF-8 locale, > which is used by that check-AUTHORS rule. > > Here's a change I'm considering. It's easy in the sense that it's merely > using an existing m4 macro, gt_LOCALE_FR_UTF8, Yes, this change will fix the test failure. > but has the drawback of depending on a locale that is less likely to be > installed than the English one. I'm not sure whether en_US.UTF-8 is more often installed than fr_FR.UTF-8. Certainly Solaris systems have it for ages, but in general the effort spent on i18n of French is greater than the one spent on i18n of English. > One twist was that on my system, the french translation of "F. Pinard" > was identical to the original Yes, the test is depending on the message catalog as well. If you use not only LC_ALL=$(LOCALE_FR_UTF8) but LC_ALL=$(LOCALE_FR_UTF8) LANGUAGE=zxx it will eliminate this source of trouble. ('zxx' is the language code for 'not applicable'; it's highly unlikely to carry a message catalog ever.) > + echo 'your system lacks a french UTF8 locale' 1>&2; \ I would write UTF-8 here. That's the only standardized name of the encoding that you mean. 2008-06-21 Bruno Haible <[EMAIL PROTECTED]> * lib/propername.c (proper_name_utf8): Don't use the transliterated result if it contains question marks. Reported by Michael Geng <[EMAIL PROTECTED]>. *** lib/propername.c.orig 2008-06-21 17:47:37.000000000 +0200 --- lib/propername.c 2008-06-21 17:37:16.000000000 +0200 *************** *** 205,219 **** # if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2) || __GLIBC__ > 2 \ || _LIBICONV_VERSION >= 0x0105 { size_t len = strlen (locale_code); char *locale_code_translit = XNMALLOC (len + 10 + 1, char); memcpy (locale_code_translit, locale_code, len); memcpy (locale_code_translit + len, "//TRANSLIT", 10 + 1); ! name_converted_translit = alloc_name_converted_translit = xstr_iconv (name_utf8, "UTF-8", locale_code_translit); free (locale_code_translit); } # endif #endif --- 205,236 ---- # if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2) || __GLIBC__ > 2 \ || _LIBICONV_VERSION >= 0x0105 { + char *converted_translit; + size_t len = strlen (locale_code); char *locale_code_translit = XNMALLOC (len + 10 + 1, char); memcpy (locale_code_translit, locale_code, len); memcpy (locale_code_translit + len, "//TRANSLIT", 10 + 1); ! converted_translit = xstr_iconv (name_utf8, "UTF-8", locale_code_translit); free (locale_code_translit); + + if (converted_translit != NULL) + { + # if !_LIBICONV_VERSION + /* Don't use the transliteration if it added question marks. + glibc's transliteration falls back to question marks; libiconv's + transliteration does not. + mbschr is equivalent to strchr in this case. */ + if (strchr (converted_translit, '?') != NULL) + free (converted_translit); + else + # endif + name_converted_translit = alloc_name_converted_translit = + converted_translit; + } } # endif #endif *************** *** 270,276 **** } } ! #ifdef TEST # include <locale.h> int main (int argc, char *argv[]) --- 287,293 ---- } } ! #ifdef TEST1 # include <locale.h> int main (int argc, char *argv[]) *************** *** 281,283 **** --- 298,312 ---- return 0; } #endif + + #ifdef TEST2 + # include <locale.h> + # include <stdio.h> + int + main (int argc, char *argv[]) + { + setlocale (LC_ALL, ""); + printf ("%s\n", proper_name_utf8 ("Franc,ois Pinard", "Fran\303\247ois Pinard")); + return 0; + } + #endif