Re: De-vendoring gnulib in Debian packages

2024-05-12 Thread Theodore Ts'o
On Sat, May 11, 2024 at 04:09:23PM +0200, Simon Josefsson wrote:
>The current approach of running autoreconf -fi is based on a
>misunderstanding: autoreconf -fi is documented to not replace certain
>files with newer versions:
>https://lists.nongnu.org/archive/html/bug-gnulib/2024-04/msg00052.html

And the root cause of *this* is because historically, people put their
own custom autoconf macros in aclocal.m4, so if autoreconf -fi
overwrote aclocal.m4, things could break.  This also means that
programmtically always doing "rm -f aclocal.m4 ; aclocal --install"
will break some packages.

The best solution to this is to try to promote people to put those
autoconf macros that they are manually maintaining that can't be
supplied in acinclude.m4, which is now included by default by autoconf
in addition to aclocal.m4.  Personally, I think the two names are
confusing and if it weren't for historical reasons, perhaps should
have been swapped, but oh, well

(For example, I have some custom local autoconf macros needed to
support MacOS in e2fsprogs's acinclude.m4.)

> 1) Use upstream's PGP signed git-archive tarball.

Here's how I do it in e2fsprogs which (a) makes the git-archive
tarball be bit-for-bit reproducible given a particular git commit ID,
and (b) minimizes the size of the tarball when stored using
pristine-tar:

https://github.com/tytso/e2fsprogs/blob/master/util/gen-git-tarball

> To reach our goals in the beginning of this post, this upstream tarball
> has to be filtered to remove all pre-generated artifacts and vendored
> code.  Use some mechanism, like the debian/copyright Files-Excluded
> mechanism to remove them.  If you used a git-archive upstream tarball,
> chances are higher that you won't have to do a lot of work especially
> for pre-generated scripts.

Why does it *has* to be filtered?  For the purposes of building, if
you really want to nuke all of the pre-generated files, you can just
move them out of the way at the beginning of the debian/rules run, and
then move them back as part of "debian/rules clean".  Then you can use
autoreconf -fi to your heart's content in debian/rules (modulo
possibly breaking things if you insist on nuking aclocal.m4 and
regenerating it without taking proper care, as discussed above).

This also allows the *.orig.tar.gz to be the same as the upstream
signed PGP tarball, which you've said is the ideal, no?

> There is one design of gnulib that is important to understand: gnulib is
> a source-only library and is not versioned and has no release tarballs.
> Its release artifact is the git repository containing all the commits.
> Packages like coreutils, gzip, tar etc pin to one particular commit of
> gnulib.

Note that how we treat gnulib is a bit differently from how we treat
other C shared libraries, where we claim that *all* libraries must be
dynamically linked, and that include source code by reference is
against Debian Policy, precisely because of the toil needed to update
all of the binary packages should some security vulnerability gets
discovered in the library which is either linked statically or
included by code duplication.

And yet, we seem to have given a pass for gnulib, probably because it
would be too awkward to enforce that rule *everywhere*, so apparently
we've turned a blind eye.

I personally think the "everything must be dynamically linked" to be
not really workable in real life, and should be an aspirational goal
--- and the fact that we treat gnulib differently is a great proof
point about how the current debian policy is not really doable in real
life if it were enforced strictly, everywhere, with no exceptions

Certainly for languages like Rust, it *can't* be enforced, so again,
that's another place where that rule is not enforced consistently; if
it were, we wouldn't be able to ship Rust programs.

- Ted



Re: De-vendoring gnulib in Debian packages

2024-05-12 Thread Simon Josefsson
"Theodore Ts'o"  writes:

>> 1) Use upstream's PGP signed git-archive tarball.
>
> Here's how I do it in e2fsprogs which (a) makes the git-archive
> tarball be bit-for-bit reproducible given a particular git commit ID,
> and (b) minimizes the size of the tarball when stored using
> pristine-tar:
>
> https://github.com/tytso/e2fsprogs/blob/master/util/gen-git-tarball

Wow, written five years ago and basically the same thing that I suggest
(although you store pre-generated ./configure scripts in git).

Going into detail, you use 'gzip -9n' but I use git-archive defaults
which is the same as -n aka --no-name.  I agree adding -9 aka --best is
an improvement.  Gnulib's maint.mk also add --rsyncable, would you agree
that this is also an improvement?  Thus what I'm arriving at is this:

git archive --prefix=inetutils-$(git describe)/ HEAD |
   gzip --no-name --best --rsyncable > -o inetutils-$(git describe)-src.tar.gz

>> To reach our goals in the beginning of this post, this upstream tarball
>> has to be filtered to remove all pre-generated artifacts and vendored
>> code.  Use some mechanism, like the debian/copyright Files-Excluded
>> mechanism to remove them.  If you used a git-archive upstream tarball,
>> chances are higher that you won't have to do a lot of work especially
>> for pre-generated scripts.
>
> Why does it *has* to be filtered?  For the purposes of building, if
> you really want to nuke all of the pre-generated files, you can just
> move them out of the way at the beginning of the debian/rules run, and
> then move them back as part of "debian/rules clean".  Then you can use
> autoreconf -fi to your heart's content in debian/rules (modulo
> possibly breaking things if you insist on nuking aclocal.m4 and
> regenerating it without taking proper care, as discussed above).
>
> This also allows the *.orig.tar.gz to be the same as the upstream
> signed PGP tarball, which you've said is the ideal, no?

Right, there is no requirement for orig.tar.gz to be filtered.  But then
the outcome depends on upstream, and I don't think we can convince all
upstreams about these concerns.  Most upstream prefer to ship
pre-generated and vendored files in their tarballs, and will continue to
do so.  Let's assume upstream doesn't ship minimized tarballs that are
free from vendored or pre-generated files.  That's the case for most
upstream tarballs in Debian today (including e2fsprogs, openssh,
coreutils).  Without filtering that tarball we won't fulfil the goals I
mentioned in the beginning of my post.  The downsides with not filtering
include (somewhat repeating myself):

- Opens up for bugs causing pre-generated files not being re-generated
  even when they are used to build the package.  I think this is fairly
  common in Debian packages.  Making sure all pre-generated files are
  re-generated during build -- or confirming that the file is not used
  at all -- is tedious and fragile work.  Work that has to be done for
  every release.  Are you certain that ./configure is re-generated?  If
  it is not present you would notice.

- Auditing the pre-generated and vendored files for malicious content
  takes more time than not having to audit those files.  Especially if
  those files are not stored in upstream git.

- Pre-generated and vendored files trigger licensing concerns and
  require tedious work that doesn't improve the binary package
  deliverable.  Consider files like texinfo.tex for example, wouldn't it
  be better to strip that out of tarballs and not have to add it to
  debian/copyright?  If some code in a package, let's say getopt.c, is
  not used during build of the package, the license of that file doesn't
  have to be mentioned in debian/copyright if I understand correctly:
  https://www.debian.org/doc/debian-policy/ch-archive.html#s-pkgcopyright
  If in a few releases later, that file starts to get used during
  compilation, the package maintainer will likely not notice.  If it was
  filtered, the maintainer would notice.

The best is when upstream ship a tarball consistent with what I dream
*.orig.tar.gz should be: free of vendored and pre-generated files.
Debian package maintainers can take action before this happens, to reach
nice properties within Debian.  Maybe some upstream will adapt.

>> There is one design of gnulib that is important to understand: gnulib is
>> a source-only library and is not versioned and has no release tarballs.
>> Its release artifact is the git repository containing all the commits.
>> Packages like coreutils, gzip, tar etc pin to one particular commit of
>> gnulib.
>
> Note that how we treat gnulib is a bit differently from how we treat
> other C shared libraries, where we claim that *all* libraries must be
> dynamically linked, and that include source code by reference is
> against Debian Policy, precisely because of the toil needed to update
> all of the binary packages should some security vulnerability gets
> discovered in the library which is either linked statically o

Re: De-vendoring gnulib in Debian packages

2024-05-12 Thread Russ Allbery
"Theodore Ts'o"  writes:

> The best solution to this is to try to promote people to put those
> autoconf macros that they are manually maintaining that can't be
> supplied in acinclude.m4, which is now included by default by autoconf
> in addition to aclocal.m4.

Or use a subdirectory named something like m4, so that you can put each
conceptually separate macro in a separate file and not mush everything
together, and use:

AC_CONFIG_MACRO_DIR([m4])

(and set ACLOCAL_AMFLAGS = -I m4 in Makefile.am if you're also using
Automake).

> Note that how we treat gnulib is a bit differently from how we treat
> other C shared libraries, where we claim that *all* libraries must be
> dynamically linked, and that include source code by reference is against
> Debian Policy, precisely because of the toil needed to update all of the
> binary packages should some security vulnerability gets discovered in
> the library which is either linked statically or included by code
> duplication.

> And yet, we seem to have given a pass for gnulib, probably because it
> would be too awkward to enforce that rule *everywhere*, so apparently
> we've turned a blind eye.

No, there's an explicit exception for cases like gnulib.  Policy 4.13:

Some software packages include in their distribution convenience
copies of code from other software packages, generally so that users
compiling from source don’t have to download multiple packages. Debian
packages should not make use of these convenience copies unless the
included package is explicitly intended to be used in this way.

-- 
Russ Allbery (r...@debian.org)  



Re: De-vendoring gnulib in Debian packages

2024-05-12 Thread Ansgar 🙀


Hi,

On Sun, 2024-05-12 at 08:41 -0700, Russ Allbery wrote:
> "Theodore Ts'o"  writes:
> > And yet, we seem to have given a pass for gnulib, probably because it
> > would be too awkward to enforce that rule *everywhere*, so apparently
> > we've turned a blind eye.
> 
> No, there's an explicit exception for cases like gnulib.  Policy 4.13:
> 
>     Some software packages include in their distribution convenience
>     copies of code from other software packages, generally so that users
>     compiling from source don’t have to download multiple packages. Debian
>     packages should not make use of these convenience copies unless the
>     included package is explicitly intended to be used in this way.

In ecosystems like NPM, Cargo, Golang, Python and so on pinning to
specific versions is also "explicitly intended to be used"; they just
sometimes don't include convenience copies directly as they have
tooling to download these (which is not allowed in Debian).

(Arguably Debian should use those more often as keeping all software at
the same dependency version is a futile effort IMHO...)

Gnulib is just older and targeted at the C ecosystem which still has
worse tooling that pretty much everything else.

Ansgar




Re: De-vendoring gnulib in Debian packages

2024-05-12 Thread Russ Allbery
Ansgar 🙀  writes:

> In ecosystems like NPM, Cargo, Golang, Python and so on pinning to
> specific versions is also "explicitly intended to be used"; they just
> sometimes don't include convenience copies directly as they have tooling
> to download these (which is not allowed in Debian).

Yeah, this is a somewhat different case that isn't well-documented in
Policy at the moment.

> (Arguably Debian should use those more often as keeping all software at
> the same dependency version is a futile effort IMHO...)

There's a straight tradeoff with security effort: more security work is
required for every additional copy of a library that exists in Debian
stable.  (And, of course, some languages have better support for having
multiple simultaneously-installed versions of the same library than
others.  Python's support for this is not great; the ecosystem expectation
is that one uses separate virtualenvs, which don't really solve the Debian
build dependency problem.)

-- 
Russ Allbery (r...@debian.org)  



Re: De-vendoring gnulib in Debian packages

2024-05-12 Thread Theodore Ts'o
On Sun, May 12, 2024 at 04:27:06PM +0200, Simon Josefsson wrote:
> Going into detail, you use 'gzip -9n' but I use git-archive defaults
> which is the same as -n aka --no-name.  I agree adding -9 aka --best is
> an improvement.  Gnulib's maint.mk also add --rsyncable, would you agree
> that this is also an improvement?

I'm not convinced --rsyncable is an improvement.  It makes the
compressed object slightly larger, and in exchange, if the compressed
object changes slightly, it's possible that when you rsync the changed
file, it might be more efficient.  But in the case of PGP signed
release tarballs, the file is constant; it's never going to change,
and even if there are slight changes between say, e2fsprogs v1.47.0
and e2fsprogs v1.47.1, in practice, this is not something --rsyncable
can take advantage of, unless you manually copy
e2fsprogs-v1.47.0.tar.gz to e2fsprogs-v1.47.1.tar.bz, and then rsync
e2fsprogs-v1.471.tar.g and I don't think anyone is doing this,
either automatically or manually.

That being said, --rsyncable is mostly harmless, so I don't have
strong feelings about changing it to add or remove in someone's
release workflow.

> Right, there is no requirement for orig.tar.gz to be filtered.  But then
> the outcome depends on upstream, and I don't think we can convince all
> upstreams about these concerns.  Most upstream prefer to ship
> pre-generated and vendored files in their tarballs, and will continue to
> do so.

Well, your blog entry does recognize some of the strong reasons why
upstreams will probably want to continue shipping them.  First of all,
not all compilation targets are guaranteed to have autoconf, automake,
et. al, installed.  E2fsprogs is portable to Windows, MacOS, AIX,
Solaris, HPUX, NetBSD, FreeBSD, and GNU/Hurd, in addition to Linux.
If the package subscribes to the 'all the world's Linux, and nothing
else exists/we have no interest in supporting anything elss', I'd ask
the question, why are they using autoconf in the first place?  :-)

Secondly, i have gotten burned with older versions of either autoconf
or the aclocal macros changing in incompatible ways between versions.
So my practice is to check into git the configure script as generated
by autoconf on Debian testing, which is my development system; and if
it fails on anything else, or when a new version of autoconf or
automake, etc. causes my configure script to break, I can curse, and
fix it myself instead of inflicting the breakage on people who are
downloading and trying to compile e2fsprogs.

 Let's assume upstream doesn't ship minimized tarballs that are
> free from vendored or pre-generated files.  That's the case for most
> upstream tarballs in Debian today (including e2fsprogs, openssh,
> coreutils).  Without filtering that tarball we won't fulfil the goals I
> mentioned in the beginning of my post.  The downsides with not filtering
> include (somewhat repeating myself):
>
> ...

Your arguments are made in a very general way --- there are potential
problems for _all_ autogenerated or vendored files.  However, I think
it's possible to simply things by explicitly restricting the problem
domain to those files auto-generated by autoconf, automake, libtool,
etc.  For example, the argument that this opens things up for bugs
could be fixed by having common code in a debhelper script that
re-generates all of the autoconf and related files.  This address your
"tedious" and "fragile" argument.

And if you are always regenerating those files, you don't need to
audit the code, since they are going to them, no?  And the generated
files from autoconf and friends have well understood licensing
concerns.

And by the way, all of your concerns about vendored files, and all of
my arguments for why it's no big deal apply to gnulib source files,
too, no?  Why are you so insistent on saying that upstream must never,
ever ship vendored files --- but I don't believe you are making this
argument for gnulib?

Yes, it's simpler if we have procrustean rules of the form "everything
MUST be shared libraries", and "never, EVER have generated or vendored
files".  However, I think we're much better off if we have targetted
solution which fix the 80 to 90% of the cases.  We agree that gnulib
isn't going to be a shared library; but the argument in favor of it
means that there are exception, and I think we need to have similar
accomodations files like configure, config.{guess,sub}.

Upstream *is* going to be shipping those files, and I don't think it's
worth it to deviate from upstream tarballs just to filter out those
files, even if it makes somethings simpler from your perspective.  So
I do hear your arguments; it's just on balance, my opinion is that it's
not worth it.

Cheers,

- Ted



Bug#1071011: ITP: golang-gopkg-godo.v2 -- task runner and file watcher (library)

2024-05-12 Thread Francisco Vilmar Cardoso Ruviaro
Package: wnpp
Severity: wishlist
Owner: Francisco Vilmar Cardoso Ruviaro 
X-Debbugs-Cc: debian-devel@lists.debian.org, debian...@lists.debian.org, 
vil...@debian.org

* Package name: golang-gopkg-godo.v2
  Version : 2.0.9
  Upstream Contact: Mario L. Gutierrez 
* URL : https://gopkg.in/godo.v2
* License : Expat
  Programming Lang: Go
  Description : task runner and file watcher (library)

  godo is a task runner and file watcher
  for golang in the spirit of rake, gulp.