>>>>> On Tue, 7 Jun 2016, Chí-Thanh Christopher Nguyễn wrote:
>> 4. According to Gettext documentation, "'@VARIANT' can denote any >> kind of characteristics that is not already implied by the language >> LL and the country CC." (So IIUC the BCP-47 variant "valencia" >> would become "@valencia".) > This I think is wrong and collides with POSIX. > POSIX modifiers are not allowed for LANG or LC_ALL in > POSIX.1-2008[1] Section 8.2 says you can have at most one modifier > field to "select a specific instance of localization data within a > single category", which I don't think applies because it is its own > locale, not an instance of an existing one. Furthermore (but that > doesn't apply in our use case), POSIX spec lists the example > LC_COLLATE=De_DE@dict > So what if you want Catalan Valencian with dictionary order? Or if > someone hypothetically came up with a different script? >> I haven't found any mention or usage of ISO 3166-2 region >> subdivisions in the context of locale. Can you provide any >> references for this? > As I wrote before, it is not used. But I think it is the only > spec-compliant way to marry POSIX locales with Catalan Valencian. > BCP-47 does it in a more natural way. So, trying to summarise: We cannot follow strict POSIX syntax, so our two choices are either to stick to Gettext LL_CC@VARIANT syntax or to change to BCP 47. Using BCP 47 would have some advantages: - It is a well defined standard [1] and tools for validation of language tags exist, e.g. [2]. - The L10N USE_EXPAND could follow usual USE flag syntax, as BCP 47 tags contain neither underscores (which are supposed to be reserved as USE_EXPAND separators) nor @ signs (which PMS explicitly mentions as an exception for LINGUAS). - Gettext's @VARIANT is ill-defined and conflates different characteristics like script and variant. There is no further subdivision within @VARIANT, which leads to locale names like sr@ijekavianlatin. Also different upstreams use different conventions, like @latin and @Latn for the latin script. - For the vast majority of languages, identifiers are either identical ("de" -> "de") or they can be converted by simple shell substitution ("pt-BR" -> "pt_BR"). - IIUC, L10N is primarily intended to control things like additional language bundles of packages. Some upstreams like libreoffice already use BCP 47 for these. On the other hand, there will be some cost: - If BCP 47 tags containing a script or a variant should be used to generate LINGUAS, they will require explicit mapping. (OTOH, such mapping will also be needed if we stick to Gettext syntax but unify variants like "sr@latin" and "sr@Latn".) - Different syntax for LINGUAS and L10N might be confusing to users, so additional documentation will be needed. Comments? Ulrich [1] https://tools.ietf.org/html/bcp47 [2] http://schneegans.de/lv/
pgp7lGNFQNBw3.pgp
Description: PGP signature