On 2022-11-13 17:42:27 (+0100), Jelle van der Waa wrote:
> For packaging we now rely on external parties to keep the source code
> hosted which can be a problem and some packagers want to search
> through all our packages code. [1]
> 
> Currently we already archive sources using `sourceballs` on
> repos.archlinux.org for GPL licensed packages, this is limited to a
> subset of all packages and done after the fact (A timer which runs
> every 8 hours and part of dbscripts). sourceballs calls `makepkg
> --nocolor --allsource --ignorearch --skippgpcheck`. This can be a
> problem as it runs after the package has been committed and it other
> network issues which might occur specific to the server. (ie. source
> cannot be downloaded where server is hosted)

I believe it would be good if the build tooling would take care of this
instead and release the source tarballs to the repository management
software (alongside the packages).

> To make this more robust, when committing a package using communitypkg
> or equivalent we also rsync the sources to a location on
> repos.archlinux.org (Gemini). This means the sources are consistent,
> and this opens the ability to implement a fallback or to change
> devtools to look at our sources archive when building a package. That
> would benefit reproducible builds as well and automated rebuilds.
> 
> Searching through our source code would be a next nice to have, most
> solutions such as sourcegraph/hound require a Git repository. [3] [4]
> So maybe we can hack up a repository which just git adds all directories and
> keeps one git commit? That should probably be not too much of a waste. But
> the first proposal is to first archive all our code in a way it can be
> consumed by a search solution.

If I understand this correctly, you would want to add the sources
(upstream and our additions for the build) of each package to one
repository, or each to their own?

The creation of e.g. a git repository to store the (upstream and maybe
our) sources of a package I would also see on the side of the tooling
creating packages and uploading artifacts to $place for releasing.
As the upstream tarballs contained in the source tarball that makepkg
creates are (hopefully) versioned and if we think of adding their
contents to a git repository, we need to come up with a clever solution
on how to deal with the changes over time.
But I'm not 100% sure I understood the idea for the creation of the
repository yet.

> Questions:
> 
> * How do we deal with archiving patches, PKGBUILD's etc. for GPL compliance
> (just save it next to the code?)
> * How do we determine when sources can be removed / cleaned up (we can't
> store things forever). DBscripts hooks?
> * Do we have enough disk space for archiving?

An additional question I would like to add to your set of questions is:
What do we do with e.g. binary only upstreams (we have a few) for which
we would not want to create source repos or exclude the binary blobs?


As a sidenote:
For repod I have just implemented the first basic (configurable)
archiving functionality for successfully added packages:
https://gitlab.archlinux.org/archlinux/repod/-/merge_requests/137

This does not yet extend towards source tarballs, as they are not
created by repod (also source tarballs are currently still a bit of a
backburner topic), and IMHO also should not be created by it in the
future either, but rather by the tooling that built and pushes the
artifacts into it.
FWIW, this initial functionality also does not yet concern itself with
any cleanup scenario of the archived files, but with being (in
structure) compatible with dbscripts.

When looking at (in the future) decoupling the building of source
tarballs from the software maintaining the package and source artifacts
(repod in that case), this still leaves us with a scenario in which we
need to deal with cleanup of archive directories (e.g. upload to
internet archive for long-term storage).

I see some overlap with what repod's goals are in the questions you are
bringing forward and it would be great if we could sync up on that
during the next repod meeting if you have time.

Best,
David

-- 
https://sleepmap.de

Attachment: signature.asc
Description: PGP signature

Reply via email to