On 11/13/22 19:37, David Runge wrote:
On 2022-11-13 17:42:27 (+0100), Jelle van der Waa wrote:
For packaging we now rely on external parties to keep the source code
hosted which can be a problem and some packagers want to search
through all our packages code. [1]

Currently we already archive sources using `sourceballs` on
repos.archlinux.org for GPL licensed packages, this is limited to a
subset of all packages and done after the fact (A timer which runs
every 8 hours and part of dbscripts). sourceballs calls `makepkg
--nocolor --allsource --ignorearch --skippgpcheck`. This can be a
problem as it runs after the package has been committed and it other
network issues which might occur specific to the server. (ie. source
cannot be downloaded where server is hosted)

I believe it would be good if the build tooling would take care of this
instead and release the source tarballs to the repository management
software (alongside the packages).


Answer merged together into next section.

To make this more robust, when committing a package using communitypkg
or equivalent we also rsync the sources to a location on
repos.archlinux.org (Gemini). This means the sources are consistent,
and this opens the ability to implement a fallback or to change
devtools to look at our sources archive when building a package. That
would benefit reproducible builds as well and automated rebuilds.

Searching through our source code would be a next nice to have, most
solutions such as sourcegraph/hound require a Git repository. [3] [4]
So maybe we can hack up a repository which just git adds all directories and
keeps one git commit? That should probably be not too much of a waste. But
the first proposal is to first archive all our code in a way it can be
consumed by a search solution.

If I understand this correctly, you would want to add the sources
(upstream and our additions for the build) of each package to one
repository, or each to their own?

The creation of e.g. a git repository to store the (upstream and maybe
our) sources of a package I would also see on the side of the tooling
creating packages and uploading artifacts to $place for releasing.
As the upstream tarballs contained in the source tarball that makepkg
creates are (hopefully) versioned and if we think of adding their
contents to a git repository, we need to come up with a clever solution
on how to deal with the changes over time.



This all sounds nice and easy on a first glace, but at
the end is a huge can of worms and we need to be aware
of the implications:

If we would tie this directly to package build tooling,
this would mean packagers packaging locally will face to
upload gigabytes of sources alongside the build
artifacts. This includes whole git repositories or huge
mono state tarballs like chromium (1.6GB).

If we go this route, we would make it very hard to
package anything with bigger sources locally (where
downloading is much less of an issue than uploading).
This route is more something that may be feasible in a
future if we had migrated to fully remote building f.e.
with buildbot.

If we want to have this rather short term, I'd recommend
we dig into how much of an issue it really would be to use decoupled source archiving like we do for GPL
sources. Of cause we would have a window of opportunity
to not be able to grab the sources after 8h, but I'd
argue that would justify raising an alert to the package
maintainer and having retry mechanisms.


It's very good that Jelle is raising this question and potential issues with decoupled source archiving. But it
feels a bit like we obstruct ourselves moving forward
by trying to solve a (hopefully rather rare) issue of
not being able to grab upstream sources after ~8 hours.

My recommendation would be:
- try getting the decoupled way solved, including our
  storage and backup problems foutrelis pointed out.
- implement alerting if we fail to fetch sources,
  should happen rarely and it something a maintainer
  should look at
- make use of that to feed into a source indexer
  so we can already leverage the advantages
- once we reach a future where we have robots taking
  over 🤖 with more build automation, investigate
  into migrating the source archiving into the actual
  build process

Cheers,
Levente

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to