Hi! On Wed, 2016-11-09 at 13:43:21 +0000, Ian Jackson wrote: > Package: dpkg-dev > Version: 1.18.12
> Many package build rules, dh rules, etc., rely on shell globbing. > This shell globbing needs to be predictable. > > The output of a package build ought not to depend on the locale at > all, really. (This is one of the things that the reproducible builds > people are trying to ensure.) But we don't want to set LC_MESSAGES, > at least, because we want people to be able to debug builds in their > native language, as far as possible. > > It is difficult to imagine a situation where a honouring a user's > LC_COLLATE during a package build would be beneficial. Agreed. > In practice, nonstandard LC_COLLATE values can break perfectly > sensible looking build code. For example, chiark-utils 5.0.0+exp1 > FTBFS in current stretch when LC_COLLATE=fr_CH.UTF-8 because of this: > $ touch 11 pp qq > $ LC_COLLATE=fr_CH.UTF-8 bash -c 'echo [!A-Z]*[!~]' > 11 > $ > (Interestingly, many of these FTBFS problems will be hidden if /bin/sh > is dash, because dash does not honour locales for globbing. This is > clearly legal according to the spec, and probably a good decision.) > > In principle this bug might be fixable by asking (almost) every > package to set LC_COLLATE in debian/rules. But ISTM that it would be > much better to fix this in dpkg-buildpackage. That would be certainly easier, but this is yet again another instance of dpkg-buildpackage not being considered the canonical entry point for building packages, debian/rules is. So as long as that's the case I'm very hesitant to set yet another variable that should in principle be set explicitly by the package itself. I know, this probably implies lots of changes, but given past resistence on this topic, I'd rather not go down the same path. :( (See the NOTES section in dpkg-buildpackage(1).) And playing devil's advocate here, arguably some of those are also probably upstream issues that would better be fixed there. :) > I suggest that dpkg-buildpackage should do as follows: > > * Unconditionally set one of the following > LC_COLLATE=C.UTF-8 > LC_COLLATE=C > Colin Watson tells me that C.UTF-8 has been in libc since > approximately squeeze. C is theoretically UB (!) for high-bit > set octets but in practice works just fine (and it would be > intolerable if it didn't). Doing that in the packages might be fine as they can assume a Debian distribution, but probably not for dpkg itself, as it runs on systems were C.UTF-8 cannot be assumed to be available, and it's probably not even supported at all! > * Check the effective LC_COLLATE using locale(1), and produce > a warning if the result is not m/^C(?=\.|$)/. (This is useful > because some misguided user might set LC_ALL.) Hmm, so in a way this looks nice, because it can be implemented right away in dpkg-buildpackage regardless of the system supporting C.UTF-8 locales. But then if the conclusion is that packages need to sort this out by themselves, it feels out of place to emit such warnings, as I don't see why the users should not be able to select non C collation. > In the meantime the reproducible builds folks may want to consider > explicitly setting LC_COLLATE to something sane in their 2nd build. I guess that depends on whether we still want to make packages self-contained and buildable correctly just with debian/rules or not. :) Thanks, Guillem