Hi!

On Wed, 2016-11-09 at 13:43:21 +0000, Ian Jackson wrote:
> Package: dpkg-dev
> Version: 1.18.12

> Many package build rules, dh rules, etc., rely on shell globbing.
> This shell globbing needs to be predictable.
> 
> The output of a package build ought not to depend on the locale at
> all, really.  (This is one of the things that the reproducible builds
> people are trying to ensure.)  But we don't want to set LC_MESSAGES,
> at least, because we want people to be able to debug builds in their
> native language, as far as possible.
>
> It is difficult to imagine a situation where a honouring a user's
> LC_COLLATE during a package build would be beneficial.

Agreed.

> In practice, nonstandard LC_COLLATE values can break perfectly
> sensible looking build code.  For example, chiark-utils 5.0.0+exp1
> FTBFS in current stretch when LC_COLLATE=fr_CH.UTF-8 because of this:
>   $ touch 11 pp qq
>   $ LC_COLLATE=fr_CH.UTF-8 bash -c 'echo [!A-Z]*[!~]'
>   11
>   $
> (Interestingly, many of these FTBFS problems will be hidden if /bin/sh
> is dash, because dash does not honour locales for globbing.  This is
> clearly legal according to the spec, and probably a good decision.)
> 
> In principle this bug might be fixable by asking (almost) every
> package to set LC_COLLATE in debian/rules.  But ISTM that it would be
> much better to fix this in dpkg-buildpackage.

That would be certainly easier, but this is yet again another instance
of dpkg-buildpackage not being considered the canonical entry point
for building packages, debian/rules is. So as long as that's the case
I'm very hesitant to set yet another variable that should in principle
be set explicitly by the package itself. I know, this probably implies
lots of changes, but given past resistence on this topic, I'd rather
not go down the same path. :(

(See the NOTES section in dpkg-buildpackage(1).)

And playing devil's advocate here, arguably some of those are also
probably upstream issues that would better be fixed there. :)

> I suggest that dpkg-buildpackage should do as follows:
> 
>  * Unconditionally set one of the following
>        LC_COLLATE=C.UTF-8
>        LC_COLLATE=C
>    Colin Watson tells me that C.UTF-8 has been in libc since
>    approximately squeeze.  C is theoretically UB (!) for high-bit
>    set octets but in practice works just fine (and it would be
>    intolerable if it didn't).

Doing that in the packages might be fine as they can assume a Debian
distribution, but probably not for dpkg itself, as it runs on systems
were C.UTF-8 cannot be assumed to be available, and it's probably not
even supported at all!

>  * Check the effective LC_COLLATE using locale(1), and produce
>    a warning if the result is not m/^C(?=\.|$)/.  (This is useful
>    because some misguided user might set LC_ALL.)

Hmm, so in a way this looks nice, because it can be implemented right
away in dpkg-buildpackage regardless of the system supporting C.UTF-8
locales. But then if the conclusion is that packages need to sort this
out by themselves, it feels out of place to emit such warnings, as I
don't see why the users should not be able to select non C collation.

> In the meantime the reproducible builds folks may want to consider
> explicitly setting LC_COLLATE to something sane in their 2nd build.

I guess that depends on whether we still want to make packages
self-contained and buildable correctly just with debian/rules or
not. :)

Thanks,
Guillem

Reply via email to