Re: check-AUTHORS fails because of non ansi characters

Bruno Haible Sat, 21 Jun 2008 09:16:23 -0700

> > |> 58c58
> > |> < ptx: Fran?ois Pinard

This is not user friendly: proper_name_utf8 should not return a result with
question marks. Instead it's better if it returns its first argument. I'm
fixing it through the appended patch. But it will not fix the coreutils
test failure.


> > |> ---
> > |>> ptx: FranÃ§ois Pinard
> >
> > In my email, this is rendering as one vs. two characters.  I suspect it
> > might be a locale issue - perhaps Jim is using a UTF-8 locale, and Michael
> > is using a Latin-1 encoding?

Michael must be using a locale in ASCII encoding; if it were a Latin1 encoding,
the output would have contained a cedilla, not a question mark.

Jim Meyering wrote:
> The problem is probably that his system lacks the en_US.UTF-8 locale,
> which is used by that check-AUTHORS rule.
> 
> Here's a change I'm considering.  It's easy in the sense that it's merely
> using an existing m4 macro, gt_LOCALE_FR_UTF8,

Yes, this change will fix the test failure.

> but has the drawback of depending on a locale that is less likely to be
> installed than the English one.

I'm not sure whether en_US.UTF-8 is more often installed than fr_FR.UTF-8.
Certainly Solaris systems have it for ages, but in general the effort spent
on i18n of French is greater than the one spent on i18n of English.

> One twist was that on my system, the french translation of "F. Pinard"
> was identical to the original

Yes, the test is depending on the message catalog as well. If you
use not only
    LC_ALL=$(LOCALE_FR_UTF8)
but
    LC_ALL=$(LOCALE_FR_UTF8) LANGUAGE=zxx
it will eliminate this source of trouble. ('zxx' is the language code for
'not applicable'; it's highly unlikely to carry a message catalog ever.)

> +       echo 'your system lacks a french UTF8 locale' 1>&2;   \

I would write UTF-8 here. That's the only standardized name of the encoding
that you mean.


2008-06-21  Bruno Haible  <[EMAIL PROTECTED]>

        * lib/propername.c (proper_name_utf8): Don't use the transliterated
        result if it contains question marks.
        Reported by Michael Geng <[EMAIL PROTECTED]>.

*** lib/propername.c.orig       2008-06-21 17:47:37.000000000 +0200
--- lib/propername.c    2008-06-21 17:37:16.000000000 +0200
***************
*** 205,219 ****
  # if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2) || __GLIBC__ > 2 \
       || _LIBICONV_VERSION >= 0x0105
        {
        size_t len = strlen (locale_code);
        char *locale_code_translit = XNMALLOC (len + 10 + 1, char);
        memcpy (locale_code_translit, locale_code, len);
        memcpy (locale_code_translit + len, "//TRANSLIT", 10 + 1);
  
!       name_converted_translit = alloc_name_converted_translit =
          xstr_iconv (name_utf8, "UTF-8", locale_code_translit);
  
        free (locale_code_translit);
        }
  # endif
  #endif
--- 205,236 ----
  # if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2) || __GLIBC__ > 2 \
       || _LIBICONV_VERSION >= 0x0105
        {
+       char *converted_translit;
+ 
        size_t len = strlen (locale_code);
        char *locale_code_translit = XNMALLOC (len + 10 + 1, char);
        memcpy (locale_code_translit, locale_code, len);
        memcpy (locale_code_translit + len, "//TRANSLIT", 10 + 1);
  
!       converted_translit =
          xstr_iconv (name_utf8, "UTF-8", locale_code_translit);
  
        free (locale_code_translit);
+ 
+       if (converted_translit != NULL)
+         {
+ #  if !_LIBICONV_VERSION
+           /* Don't use the transliteration if it added question marks.
+              glibc's transliteration falls back to question marks; libiconv's
+              transliteration does not.
+              mbschr is equivalent to strchr in this case.  */
+           if (strchr (converted_translit, '?') != NULL)
+             free (converted_translit);
+           else
+ #  endif
+             name_converted_translit = alloc_name_converted_translit =
+               converted_translit;
+         }
        }
  # endif
  #endif
***************
*** 270,276 ****
      }
  }
  
! #ifdef TEST
  # include <locale.h>
  int
  main (int argc, char *argv[])
--- 287,293 ----
      }
  }
  
! #ifdef TEST1
  # include <locale.h>
  int
  main (int argc, char *argv[])
***************
*** 281,283 ****
--- 298,312 ----
    return 0;
  }
  #endif
+ 
+ #ifdef TEST2
+ # include <locale.h>
+ # include <stdio.h>
+ int
+ main (int argc, char *argv[])
+ {
+   setlocale (LC_ALL, "");
+   printf ("%s\n", proper_name_utf8 ("Franc,ois Pinard", "Fran\303\247ois 
Pinard"));
+   return 0;
+ }
+ #endif

Re: check-AUTHORS fails because of non ansi characters

Reply via email to