Axel Beckert <a...@debian.org> writes: > Anyway, JFTR: I just looked at how lintian in Debian Stable (i.e. > 2.104.0 in Bullseye) does the locale code lookup. It had it's own data > file for that (and hence now using iso-codes is good as it is no more > duplicating these 33kB of data) and that file > (/usr/share/lintian/data/files/locale-codes) states:
> # List of locale codes. This is derived from the ISO 639-1, ISO > # 639-2, and ISO 639-3 standards. > And indeed, "ber" was in that file. > So previously lintian did use ISO 639-1, 639-2 and 639-3. > So using just ISO 639-3 was either an accident, on purpose or a > regression and has been introduced when lintian was switching to > iso-code's files as data source in commit > https://salsa.debian.org/lintian/lintian/-/commit/fcaded19 What I think I managed to reconstruct from reading about this [1] is that 639-2 was the original work to supplement 639-1 (which is limited to two-letter codes and omits a lot of smaller languages). However, ISO 639-2 also assigned codes to language families and some other things, wherease ISO 639-3 is limited to just languages and the families moved to ISO 639-5. [1] https://en.wikipedia.org/wiki/ISO_639-2 mostly. Looking at ISO 639-5, I think a lot of those wouldn't make sense as translations. It has a lot of things like zhx (Chinese family), cpe (all English-based creoles), or grk (Greek languages). Some of those (cpe for example) also appear in ISO 639-2, which implies to me that 639-2 is a bit too broad for useful translations. That said, reading more about the Berber languages [2], I understand how this happened with this group in particular. Specifically, this: A listing of the other Berber languages is complicated by their closeness; there is little distinction between language and dialect. The primary difficulty of subclassification, however, lies in the eastern Berber languages, where there is little agreement. probably implies that the languages are sufficiently mutually comprehensible that it may make sense to translate something to "Berber" without specifying a specific language in the family. (I could imagine that sometimes it may avoid political and social issues to not specify a specific language from the family, although I have no idea if that's the case here.) [2] https://en.wikipedia.org/wiki/Berber_languages However, that wouldn't really make sense for "cpe" (creoles are very different from each other even if they're English-based). So that still feels to me like it leans away from including everything in 639-2. I think I may be talking myself into adding an exception list of non-639-3 language codes that nonetheless are used by translators. But that's an ongoing maintenance burden, so maybe that's not the right move either. The alternate argument is that Lintian's check is really mostly there to catch typos, and maybe we should assume anyone who uses any 639-2 or 639-3 code knows what they're doing. And since that's what Lintian used to do, it has the benefit of fixing a regression and I don't think anyone was complaining about the breadth of the previous list, just the duplication of information. So in short, I think I talked myself back around to your solution. :) (Maybe all of this can be captured in comments for the next poor maintainer who has to try to understand what's going on.) -- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>