Package: lintian Severity: important Hi,
Since October 22, the Lintian jobs on Salsa pipelines have failed for many packages, including Lintian. [1] In the container, 'groff' cannot find LC_ALL defined even though we reset the environment and explicitly provide LC_ALL=C.UTF-8. [2] It is ineffective, and that is the substance of this bug. I am not sure what changed (my commit could not have done it). I only see a recent upload for bash, but not for groff, perl or libipc-run3-perl. I already asked on #salsci. I was told that the runner images had not been manipulated in some time. The cause would likely be elsewhere. I also asked on #salsa because, on October 22, the runner base system was upgraded to Debian 10 from 9. (That is unrelated to the images provided by Salsa CI.) According to the Salsa admins that change should have had no impact. For the time being we are at a loss. Colin Watson provides a workaround below, but we will try to find the real bug first: "If all else fails then setting MAN_NO_LOCALE_WARNING=1 may be a viable workaround." This bug filing follows discussions on debian-devel@lists.d.o and IRC, the relevant parts of which were copied below. There is also a Salsa issue about this [3], but it's probably better to centralize the discussion here. The bug may indeed belong to Salsa CI but issues filed on that website seem less permanent than a bug in the BTS. Please use this bug to comment on the issue going forward. Thank you! Kind regards Felix Lechner [1] https://lintian.pages.debian.net/-/lintian/-/jobs/1098261/artifacts/debian/output/lintian.html [2] https://salsa.debian.org/lintian/lintian/-/blob/master/checks/documentation/manual.pm#L279-281 [3] https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182 *** Hello there! My Salsa CI pipeline is blowing up in the lintian step, with lots of warnings of the form: "W: notcurses-bin: groff-message usr/share/man/man1/notcurses-demo.1.gz can't set the locale; make sure $LC_* and $LANG are correct" This is printed for each man page I package. An example run is here: https://salsa.debian.org/debian/notcurses/-/jobs/1107065 The only reference I could find was https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=606933, which didn't seem relevant. Is this due to having supra-ascii UTF8 characters in my man pages? Is there anything I can do to work around this? I tried exporting LANG to a UTF-8 locale in my salsa variables section, but that didn't help. I'm using pandoc to generate my man pages, and it happily accepts UTF-8, but I can see a case for restricting them to ASCII. Thanks! -- nick black -=- https://www.nick-black.com * * * Hi Nick, On Sun, Oct 25, 2020 at 6:23 PM Nick Black <dankamong...@gmail.com> wrote: > > Is this due to having supra-ascii UTF8 characters in my man > pages? It's not a problem with your package. Lintian's own pipeline is likewise affected, even though our test suite completes fine in an unstable chroot. The issue is being tracked here: https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182 Kind regards Felix Lechner * * * Felix Lechner left as an exercise for the reader: > It's not a problem with your package. Lintian's own pipeline is > likewise affected, even though our test suite completes fine in an > unstable chroot. The issue is being tracked here: > https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182 Thanks for the quick response, Felix. You say that "[you] will probably start setting $LANG in that part of Lintian." what LANG will you be using? Attempting to set LANG=en_US.UTF-8 in my salsa ci variables resulted in setlocale(3) failing all over the place, presumably due to the locale not having been generated. -- nick black -=- https://www.nick-black.com * * * On Mon, 26 Oct 2020 at 00:35:45 -0400, Nick Black wrote: > Thanks for the quick response, Felix. You say that "[you] will > probably start setting $LANG in that part of Lintian." what LANG > will you be using? Attempting to set LANG=en_US.UTF-8 in my > salsa ci variables resulted in setlocale(3) failing all over the > place, presumably due to the locale not having been generated. C.UTF-8 is available on all Debian systems. It's the standard C/POSIX locale, except that in the C locale the meaning of bytes 0x80-0xFF is undefined, while in C.UTF-8 they are assumed/defined to be part of a character encoded in UTF-8. If you care about portability to non-Debian systems, note that C.UTF-8 is a somewhat popular extension (I think it originated in the Fedora/Red Hat family before it was adopted by Debian and other distros) but is far from universally available. In particular, I'm aware of Arch Linux specifically *not* having it. The glibc maintainers consider the implementation used in e.g. Fedora and Debian to be a hack rather than something they want to maintain forever, but my understanding is that they would be willing to accept a better implementation. en_US.UTF-8 is indeed not portable. Some OSs (Fedora, I think?) always generate the en_US.UTF-8 locale regardless of any other configuration that might exist, but Debian does not: if you chose a non-English locale like fr_FR.UTF-8 or a non-American English locale like en_GB.UTF-8 during installation, then you will normally only have three locales, your chosen national locale plus the international locales C and C.UTF-8. Minimal container/chroot environments, and in particular the official Debian buildds, will normally only have C and C.UTF-8. See src:gtk+4.0 for an example of how to generate additional locales on-demand if your unit tests need them. Third-party software from outside Debian frequently assumes that the en_US.UTF-8 locale does exist - in particular, it's common enough for Steam games to want it to exist that Steam's diagnostic tool now checks for it. This is mostly because it's semi-frequently (ab)used as a way to parse and serialize C-syntax floating point in programming languages or configuration files without getting confused by non-English decimal points (e.g. 1.23 in English locales is 1,23 in French locales, which means a naive implementation might write {"x": 1,23, "y": 4,56} into a JSON file, which is of course a syntax error). The portable way to read/write configuration files and C-like source code is to avoid the POSIX locale-sensitive functions completely, and use something like GLib's g_ascii_strtod() or CPython's PyOS_string_to_double() (lots of libraries and frameworks will have an equivalent, those are just the ones I'm most familiar with). This also has the advantage of being thread-safe, unlike temporarily switching POSIX locales, which is normally process-wide and therefore not thread-safe. Another correct way to do this since POSIX.1-2008 is to use POSIX uselocale() and the C locale, but that's unlikely to be portable to Windows or to exotic Unix implementations, so widely-portable software generally ends up having to reinvent something equivalent to g_ascii_strtod() anyway. smcv * * * Simon McVittie left as an exercise for the reader: > If you care about portability to non-Debian systems, note that C.UTF-8 is > a somewhat popular extension (I think it originated in the Fedora/Red Hat > family before it was adopted by Debian and other distros) but is far from > universally available. In particular, I'm aware of Arch Linux specifically > *not* having it. The glibc maintainers consider the implementation used > in e.g. Fedora and Debian to be a hack rather than something they want to > maintain forever, but my understanding is that they would be willing to > accept a better implementation. As I "need" this only within the Debian Salsa CI (and only to deal with this groff lintian warning, which it sounds like will be handled another way), a Debian-specific solution would be fine =]. Thanks for the details -- C.UTF-8 sounds like the right way to go. -- nick black -=- https://www.nick-black.com * * * On Mon, 26 Oct 2020 11:47:37 +0000, Simon McVittie wrote: > Minimal container/chroot environments, and in particular the official > Debian buildds, will normally only have C and C.UTF-8. See src:gtk+4.0 > for an example of how to generate additional locales on-demand if your > unit tests need them. Alternatively, build-depending on locales-all usually also works (benefit: no manual meddling with locales, cost: installation size). Cheers, gregor * * * Hi Nick, On Mon, Oct 26, 2020 at 5:11 AM Nick Black <dankamong...@gmail.com> wrote: > > C.UTF-8 sounds like the right way to go. As noted in the issue tracker [1], Lintian already sets LC_ALL to C.UTF-8 [2] in a sanitized environment, but we do not currently set LANG. That would have been my next step, except these issues do not occur in a clean chroot for unstable and are therefore more likely related to Salsa or Salsa CI. Kind regards Felix Lechner [1] https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182 [2] https://salsa.debian.org/lintian/lintian/-/blob/master/checks/documentation/manual.pm#L281 * * * On Mon, Oct 26, 2020 at 07:57:58AM -0700, Felix Lechner wrote: > On Mon, Oct 26, 2020 at 5:11 AM Nick Black <dankamong...@gmail.com> wrote: > > C.UTF-8 sounds like the right way to go. > > As noted in the issue tracker [1], Lintian already sets LC_ALL to > C.UTF-8 [2] in a sanitized environment, but we do not currently set > LANG. LC_ALL should imply LANG, and as far as I know that works fine in man (which is the program producing the warning message in this case), so this should make no difference. If somebody can come up with a reduced test environment in which man does not seem to interpret LC_ALL as implying LANG, I'd consider that a bug. -- Colin Watson (he/him) [cjwat...@debian.org] * * * On Mon, 26 Oct 2020 at 18:35:53 +0000, Colin Watson wrote: > LC_ALL should imply LANG One thing that it does not imply is LANGUAGE, used for LC_MESSAGES as a GNU extension (at a higher precedence than even LC_ALL). smcv * * * On Mon, Oct 26, 2020 at 08:16:43PM +0000, Simon McVittie wrote: > On Mon, 26 Oct 2020 at 18:35:53 +0000, Colin Watson wrote: > > LC_ALL should imply LANG > > One thing that it does not imply is LANGUAGE, used for LC_MESSAGES as a > GNU extension (at a higher precedence than even LC_ALL). Indeed, though I don't believe it's possible for it to cause the warning message in question here (which results from setlocale (LC_ALL, "") returning NULL). If all else fails then setting MAN_NO_LOCALE_WARNING=1 may be a viable workaround. -- Colin Watson (he/him) [cjwat...@debian.org]