Re: shared tools to validate convenience binaries and artifacts

Jarek Potiuk Sat, 22 Nov 2025 08:35:28 -0800

> does the source code in the tarball match what is announced as the git
commit. If there is a pre-existing tool that does that check, I'd love to
use it.

Actually this is something that only reproducibility checks can reliably
tell. Usually during preparation of those releases some transformations are
done (compiling stuff, transpiling, generating metadata and so on) so the
only way you can actually verify it is reproducibility.  Someone (PMC
member) who is verifying  the release should be able to prepare the same
release and compare that they are the same (ideally reproducible
bit-by-bit) - we discussed a lot about it in ATR slack and security-discuss
and this is rather something that each project will have to do on their own
(we do in Airflow) - i.e. to have instructions on how to verify the release
- one of those steps are "please recreate the package and check if it is
the same as the one you are voting on). This is something that ATR might
make easier, and run their own "rebuild and check" eventually - but the
safest way is to make your artifacts reproducible as instructions to your
PMC members - and it's just describing "How do I produce PMC reproducible
builds".

We have this nice page
https://cwiki.apache.org/confluence/display/SECURITY/Reproducible+Builds -
where we gather best ASF practices for reproducibility - also there is link
to "reproducible-builds.org" https://reproducible-builds.org/docs/ that has
wealth of information and recipes for various languages. I've been to the
Vienna Reproducible Builds Summit 3 weeks ago and I think we are getting
close to having reproducibility practices spread through the ecosystem -
also reproducible build is one of the conditions that will allow you to
fully automate ATR builds from CI. ATR has CLI, APIs and GitHub Actions,
that will allow you to do all kind of things - publish your artifacts to
ATR start voting, but also submit your artifacts to PyPI and NPM via
Trusted Publishing automatically from your release workflows in CI - but
this has one specific condition: your builds will have to be reproducible
and your PMC members when voting will have to confirm that the artifacts
produced automatically are reproducible by them.

Using ATR will allow us (we already do) to the ASF data - OID for
attestation, signing, and publishing to 3rd-party registries, but also
access to trusted committer and PMC database to know who is doing what
(like binding/non-binding votes) etc. etc. And yes I think very soon we
(ASF) will be adding, documenting and implementing more and more common
practices around release artifacts preparation - both procedural and
technological (more cryptographic attestations, storing information about
build environment in a cryptographically secure way and producing
cryptographically verifiable attestations that 3rd-parties will be able to
store on ledgers and other 3rd-parties will be able to independently
verify).

All this is currently very actively discussed and being implemented - in
"trusted-releases" and "security-discuss" mailing lists and slack channels.
So I would love to bring anyone's attention that likely those discussion
should happen there, because it's very likely (if not certain due to its
"board commissioned it with the tooling team and funded the work" status.

> One extra point that is worth mentioning. On several occasions, I’ve seen
automation give a false sense of security. A tool reports everything as
clean, and people assume the release is fine when it is not. It’s only when
humans look deeper that a serious issue is discovered. For example, a
mention of a GPL license can be fine, depending on the context, and
automation is unlikely to detect it.

Absolutely. 100% agree and this is something we usually keep on discovering
every now and then. This should **never** be removed from the picture. Even
now we have two independent licence checks in ATR - one with RAT and one
custom written by Sean, and the side effect is that they do **currently**
sometimes detect different licensing issues. And I am sure one of the
things in RAT we will do is asking (and performing that by the PMC) an
occasional "manual" verification to periodically check things manually. One
interesting point is that I think **both** should be happening and we need
to figure out how to make the automation in the way that we either remove
or actively "counteract" the "false sense of confidence". There are many
ways this can be done, for example by injecting deliberate errors in the
process or automation of reminders (super-simple thing - Apple keeps on
reminding me to manually verify my phone number every few months, just in
case I changed it. Having a single, centralised release tool gives us the
opportunity of iterating and improving on the process, and will give us
(the ASF) a way to have a step in the process where we will be able to
"inject" all kinds of behaviour-changing processes and experiment with
them. I think we will finally have.a chance to not only tell our PMCS (and
PPMCs) on how to do the releases, but also more actively monitor it and -
more importantly - influence it way more efficiently and enforceable.

J.

On Sat, Nov 22, 2025 at 5:07 PM Jarek Potiuk <[email protected]> wrote:

> I think even if ATR does not **currently** support more checks than
> **basic** checks for binary releases, there is absolutely nothing wrong in
> adding them there. ATR will (hopefully) be one of the most common used tool
> in the ASF, and we have tooling team that supports developing and
> maintenance of it, also all the code is super-easy-python code using modern
> standards, uv to run the tooling and if anyone would like to contribute a
> check for certain artifact types - like PyPI Rcs, I am 100% sure Sean and
> Dava and others who are already contributing and adding issues and tools,
> will be super happy to accept.
>
> What my post was mostly about to suggest is that very soon we will have a
> common "platform" for release verification - we (ASF) already do basic
> checks with ATR on our binary artifacts, we already use RAT from creadur
> mentioned above for licence checking and there is **absolutely no reason**
> anyone here could not add a new check there - I am sure contributions will
> be very welcome there. My cooperation with the tooling time has been
> nothing-but-stellar.
>
> So my main point is that if there are ideas how to improve this "common
> platform" we are going to have which is already plugging in our release
> process - they are absolutely welcome, but Ideally they should be added to
> ATR, rather than developed separately. It could also be - of course -
> developed separately in creadur (like RAT is) and used in ATR, but I think
> having those checks integrated with ATR is all-but-guarantee that it's
> going to be useful across the whole ASF.
>
> That's all I wanted to stress. I feel a bit defensive approach when I
> mentioned ATR, but that was more "Hey - we have this great platform for
> releases which is already funded by Alpha-Omega, and driven by board
> decision, so we should rather work on strenghtening it and adding things to
> something that is **precisely** targeting to automate the workflow that has
> been mentioned here that one that **is in a need of automation**.
>
> Yes, it is, and we have an ASF-wide effort to improve exactly that
> workflow that the board not only recognised and secured funds for and
> staffed, but also (in a recent conversation with some board members) have
> been named as the absolute game-changer for the ASF (which I 100% agree
> with).
>
> So ... let's do it as a combined effort - as simple as that :) .
>
> J.
>
>
>
> On Sat, Nov 22, 2025 at 3:20 PM sebb <[email protected]> wrote:
>
>> On Sat, 22 Nov 2025 at 14:03, PJ Fanning <[email protected]> wrote:
>> >
>> > My issue is not really about the source release and there is some
>> > tooling and typically the review checks are to be done at vote time.
>> > Here is a check that might be useful to automate and that can't be
>> > properly done without it - does the source code in the tarball match
>> > what is announced as the git commit. If there is a pre-existing tool
>> > that does that check, I'd love to use it.
>>
>> I agree that this is vital, as the tarballs are generally created from
>> whatever happens to be in the source directories.
>> It's very easy for spurious files to be added to the tarball, e.g.
>> files left over from testing.
>> An exact match is not necessary, so long as every file in the source
>> tarball can be derived from the source tag.
>>
>> I have used diff -r in the past, and some editors can show recursive
>> directory differences.
>>
>> > My issue is really with the convenience binaries. Are reviewers really
>> > unzipping jar files to check the contents and checking the text in the
>> > pom files?
>> >
>> > What format are the pypi RCs supposed to be in? Are we sure that the
>> > apache prefix appears in the target pypi project?
>> >
>> > And the big binary tarballs that some teams ship, full of jars or
>> > other compiled components? Those can be a real time consumer to
>> > manually review.
>> >
>> > Some reviewers do these convenience binary checks and maybe it's my
>> > bad luck to try checking on votes but I see a lot of issues when I
>> > review convenience binaries.
>> >
>> >
>> >
>> > On Sat, 22 Nov 2025 at 14:49, tison <[email protected]> wrote:
>> > >
>> > > > a mention of a GPL license can be fine
>> > >
>> > > Typically, you'd end up with an allow list, like [1][2]
>> > >
>> > > [1]
>> https://github.com/apache/flink/blob/d0c9ed9ff47cd0f0fae62958521a0b18e5cd9bf3/tools/ci/flink-ci-tools/src/main/java/org/apache/flink/tools/ci/licensecheck/JarFileChecker.java#L194-L260
>> > > [2]
>> https://github.com/apache/opendal/blob/c35da0d92442756d5742eaf70a2259dd23621b53/deny.toml#L28-L48
>> > >
>> > > Best,
>> > > tison.
>> > >
>> > > <[email protected]> 于2025年11月22日周六 21:44写道：
>> > > >
>> > > > Hi,
>> > > >
>> > > > One extra point that is worth mentioning. On several occasions,
>> I’ve seen automation give a false sense of security. A tool reports
>> everything as clean, and people assume the release is fine when it is not.
>> It’s only when humans look deeper that a serious issue is discovered. For
>> example, a mention of a GPL license can be fine, depending on the context,
>> and automation is unlikely to detect it.
>> > > >
>> > > > Kind Regards.
>> > > >
>> > > > Justin
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: [email protected]
>> > > For additional commands, e-mail: [email protected]
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Re: shared tools to validate convenience binaries and artifacts

Reply via email to