>>>>> On Mon, 6 Jun 2016, Ulrich Mueller wrote: >>>>> On Mon, 6 Jun 2016, Chí-Thanh Christopher Nguyễn wrote: >> I'm not totally convinced yet. >> Following the BCP-47 spec the format is
>> Language-Tag = langtag ; normal language tags >> langtag = language >> ["-" script] >> ["-" region] >> *("-" variant) >> *("-" extension) >> ["-" privateuse] >> [...] > As I understand it: > 1. Gettext documentation says that locale names can be LL_CC or > LL_CC@VARIANT. The natural mapping to the (implementation defined) > format mentioned by POSIX seems to be that LL, CC, and VARIANT > correspond to language, territory, and modifier, respectively. > 2. Language codes are taken from ISO 639, namely the two-letter code > if one exists, otherwise the three-letter code. > 3. Territory codes are taken from ISO 3166-1, usually the two-letter > country codes. > 4. According to Gettext documentation, "'@VARIANT' can denote any > kind of characteristics that is not already implied by the language > LL and the country CC." (So IIUC the BCP-47 variant "valencia" would > become "@valencia".) Of course, we could also say that Gettext/POSIX syntax (especially its variant/modifier part) is ill-defined, and use BCP-47 syntax for the L10N USE_EXPAND instead (except that the separator would be an underscore instead of a hyphen). AFAICS, there would be no change at all for any of the LL or LL_CC entries. The only ones that would change would be the (about 10) ones containing an @ sign. For example, ca@valencia would become ca_valencia, and sr@ijekavianlatin would become sr_Latn_ijekavsk. Not sure how much additional code for remapping would be required. However, my impression is that upstream usage of @VARIANT is not at all standardised, so some remapping would be required in any case if we want unique entries for L10N. Ulrich
pgpX6YOzUkd3k.pgp
Description: PGP signature