Re: Transparency into private keys of Debian

2024-02-07 Thread kpcyrd

On 2/1/24 10:38, Simon Josefsson wrote:

Hi

I'm exploring how to defend against an attacker who can create valid
signatures for cryptographic private keys (e.g., PGP) that users need to
trust when using an operating system such as Debian.  A signature like
that can be used in a targetted attacks against one victim.

For example, apt does not have any protection against this threat
scenario, and often unprotected ftp:// or http:// transports are used,
which are easy to man-in-the-middle.  Even with https:// there is a
large number of mirror sites out there that can replace content you get.
There is a risk that use of a compromised trusted apt PGP key will not
be noticed.  Attackers are also in a good position to deny themselves
out of their actions if they are careful.


hello, thanks for raising this.

I've spent some time exploring this in 2022 and figured, though I don't 
have the resources to do anything meaningful about this, I can still 
help by making reasoning about this attack easier by implementing it. I 
authored a malicious update server for multiple package managers 
including apt (the contrib/ folder has a few examples, some of them 
being incomplete or defunct tho, please do not make assumptions unless 
you tested them, many of them have a `check:` section for automatic 
integration testing):


https://github.com/kpcyrd/sh4d0wup

The name is derived from "shadow updates" that carry a valid signature 
(through private key abuse) but are fed directly to an update client and 
therefore never show up in Debian.


It has some features to illustrate how I would assume targeted attacks 
to work (switch the Release/InRelease files based on source IP address 
and/or user-agent, leading them to a different /by-hash/ url that nobody 
else knows about).


From what I've learned the most interesting keys are (for each 
respective release):


- Debian Archive Automatic Signing Key
- Debian Security Archive Automatic Signing Key
- Debian Stable Release Key

The DM/DD keys probably also have some value, but would leave a 
permanent record in Debian which might get noticed at some (distant or 
not so distant) point in the future, however when abusing release 
signing keys you are almost guaranteed to get away with it.


---

Having built all of this (and also hence having proven writing the 
software for this is well within reach of a single, semi-determined 
rando), I then spent some time in 2023 trying to think up a system that 
is as annoying as possible for sh4d0wup users and developed apt-swarm. 
It's my first attempt of an autonomous p2p network and it uses the 
configured public-keys for a "proof of authority", gossip-sync protocol 
to attempt an eventually-consistent view of all Release/InRelease files.


However:

- I need an embedded database with specific properties for the sync 
protocol (allowing fast, ordered lookups by key-prefix), I picked the 
`sled` library as suggested by a debian-rust friend at FOSDEM 2023, but 
I later learned sled needs to be able to fit the entire database into 
RAM, so running a node currently requires ~8GB RAM (and *will* require 
more in the future).
- Because of this, the entire p2p network is currently running from a 
single server (that is provided to me by a friend from a local 
hackerspace). You can join your own nodes at any time but so far nobody 
has done so.
- Despite it's name, it doesn't know about apt and only stores opaque 
signed documents, the Release/InRelease files. It does not store any 
copies of any referenced package indexes (or packages), which limits the 
amount of data you get to triage incidents with.


The last part is important because availability of this kind of data is 
going to set the scene in case you ever need to actually investigate 
something. I found that rekor from the sigstore project is a great 
public resource, but will only tell you "there's a signature you don't 
have any records of", but nowhere to go from there. It doesn't 
complement the snapshot service and it's impossible to tell apart if 
somebody generated themselves the inclusion-proof for a malicious update 
or if snapshot.debian.org simply missed/lost a snapshot.


With apt-swarm you can look at the alleged `Date:` field of the release 
file but you will (likely) still be missing the package index that was 
fed into user's apt clients (and the code they executed) unless you 
recognize the referenced hashes. Development of this project has stalled 
due to the mentioned RAM usage issue, little interest for this kind of 
tech (even inside the security community) and cyberpunk 2077 being a 
really good game.


Throughout the year apt-swarm has collected (among others):

- 20,459 signatures from `Debian Archive Automatic Signing Key (10/buster)`
- 18,085 signatures from `Debian Archive Automatic Signing Key (9/stretch)`
- 10,463 signatures from `Debian Archive Automatic Signing 

New supply-chain security tool: backseat-signed

2024-04-02 Thread kpcyrd

Hello,

I'm going to keep this short, I've been writing a lot of text recently 
(which is quite exhausting, on top of my dayjob and all the code I wrote 
today afterwards. Apologies if you're still waiting for a reply in one 
of the other threads).


I figured out a somewhat straight-forward way to check if a given `git 
archive` output is cryptographically claimed to be the source input of a 
given binary package in either Arch Linux or Debian (or both).


I believe this to be the "reproducible source tarball" thing some people 
have been asking about. As explained in the README, I believe 
reproducing autotools-generated tarballs isn't worth everybody's time 
and instead a distribution that claims to build from source should 
operate on VCS snapshots instead of tarballs with 25k lines of 
pre-generated shell-script. Building from VCS snapshots is already the 
case  for a large number of Arch Linux packages (through auto-generated 
Github tarballs). Some packages have been actively converted to VCS 
snapshots by Arch Linux staff in response to the xz incident.


This tool highlights the concept of "canonical sources", which is 
supposed to give guidance on what to code review. This is also why I 
think code signing by upstream is somewhat low priority, since the big 
distros can form consensus around "what's the source code" regardless.


https://github.com/kpcyrd/backseat-signed

The README shows how to verify Arch Linux and Debian build cmatrix from 
the same source code - they may both still apply patches (which would be 
considered part of the build instructions), but the specified source 
input is the same. This tarball can also be bit-for-bit reproduced from 
VCS by taking a `git archive` snapshot of the v2.0 tag in the cmatrix 
repository.


(If somebody ever tells you programming in Rust is slower, I wrote the 
entirety of this codebase within a few hours of a single day)


Let me know what you think. 🖤

Happy feet,
kpcyrd



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/3/24 4:21 AM, Adrian Bunk wrote:

On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:

...
I figured out a somewhat straight-forward way to check if a given `git
archive` output is cryptographically claimed to be the source input of a
given binary package in either Arch Linux or Debian (or both).


For Debian the proper approach would be to copy Checksums-Sha256 for the
source package to the buildinfo file, and there is nothing where it would
matter whether the tarball was generated from git or otherwise.


I believe this to be the "reproducible source tarball" thing some people
have been asking about.
...


The lack of a reliably reproducible checksum when using "git archive" is
the problem, and git cannot realistically provide that.

Even when called with the same parameters, "git archive" executed in
different environments might produce different archives for the same
commit ID.

It is documented that auto-generated Github tarballs for the same tag
and with the same commit ID downloaded at different times might have
different checksums.


Granted it takes some skill to take snapshots that match what github is 
generating (and there are occasional issues) but generally speaking it 
works quite well. The required command is in the README, and I encourage 
you to give it a try.


If you want something that's explicitly designed for taking reproducible 
VCS snapshots you could also consider the "Nix Archive" format[0], 
however I think more people would be in favor of agreeing on how to 
canonically derive a given git tree into a `.tar.gz` (or at least .tar) 
instead of switching Debian to the .nar file format.


[0]: https://github.com/ebkalderon/libnar

I think regular `git archive` is already pretty good, complaining that 
it may only work in 98% of cases, I'd say, is a Luxusproblem considering 
the current state of things. The next paragraph is the bigger headache:



This tool highlights the concept of "canonical sources", which is supposed
to give guidance on what to code review.
...


How does it tell the git commit ID the tarball was generated from?

Doing a code review of git sources as tarball would would be stupid,
you really want the git metadata that usually shows when, why and by
whom something was changed.


It doesn't. It works like a one-way function, it can verify a given VCS 
snapshot is definitely the source code that was ingested into Debian, 
but it can't locate the source code on its own.


I don't know if Debian has this kind of provenance information 
available, to my knowledge, Debian operates on "our maintainers upload 
.tar.xz files into our archive and we take them for face value". Which 
does make sense, considering not every software project uses git, some 
may develop their own VCS, some software projects do not have any VCS at 
all and it's just one person applying patches to a folder on their local 
computer and uploading .tar snapshots to a webserver every other month.


There's some packages that have some kind of system behind them, like 
rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to 
match <https://crates.io/api/v1/crates/toml/0.5.11/download> (although 
sometimes files get excluded from the tar upload). I'd like to 
explicitly encourage people to point me in the right direction if 
there's any existing effort of mapping debian .orig.tar.gz files to git 
tags (not necessarily bit-for-bit, but at least which commit we expect 
it to come from).



https://github.com/kpcyrd/backseat-signed

The README
...


"This requires some squinting since in Debian the source tarball is
  commonly recompressed so only the inner .tar is compared"

This doesn't sound true.


I've updated the wording and intend to investigate this further. By 
default the relevant command even expects an exact match. For example 
this works:


```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz
[2024-04-04T18:45:09Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.gz"

[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] File verified 
successfully

```

But if I repack the .tar.gz into .tar.xz it's going to get rejected:

```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz
[2024-04-04T18:48:32Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.xz"

[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Searching in index...
Error: Could not find sou

Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/5/24 12:31 AM, Adrian Bunk wrote:

Hashes of "git archive" tarballs are anyway not stable,
so whatever a maintainer generates is not worse than what is on Github.

Any proper tooling would have to verify that the contents is equal.


...
Being able to disregard the compression layer is still necessary however,
because Debian (as far as I know) never takes the hash of the inner .tar
file but only the compressed one. Because of this you may still need to
provide `--orig ` if you want to compare with an uncompressed tar.
...


Right now the preferred form of source in Debian is an upstream-signed
release tarball, NOT anything from git.

An actual improvement would be to automatically and 100% reliably
verify that a given tarball matches the commit ID and signed git tag
in an upstream git tree.


I strongly disagree. I think the upstream signature is overrated.

It's from the old mindset of code signing being the only way of securely 
getting code from upstream. Recent events have shown (instead of 
bothering upstream for signatures) it's much more important to have 
clarity and transparency what's in the code that is compiled into 
binaries and executed on our computers, instead of who we got it from. 
The entire reproducible builds effort is based on the idea of the source 
code in Debian being safe and sound to use.


If upstream refused to sign anything but pre-compiled llvm IR, I'd put 
both the IR and signature in the trash and build from source code.


If upstream wouldn't sign anything but autotools pre-processed archives 
with 25k lines of auto-generated shell scripts I'd put it next to the IR 
and build from the actual source code as well.


If upstream would only sign a tarball with files sorted in the order 
they were returned by their kernel to readdir(), I'd raise the question 
why we're having this in 2024 (and possibly suggest to use a tar with 
sorted entries).


Although to be honest if this would really be the only problem we'd be 
having, I'd likely not care anymore and put my time to better use.



Or perhaps stop using tarballs in Debian as sole permitted
form of source.


I'd be fine with that.

cheers,
kpcyrd



Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread kpcyrd

On 4/6/24 1:42 PM, Adrian Bunk wrote:

You cannot simply proclaim that some git tree is the preferred form of
modification without shipping said git tree in our ftp archive.

If your claim was true, then Debian and downstreams would be violating
licences like the GPL by not providing the preferred form of modification
in the archive.


I'm obviously not a lawyer, but I do think this is the case. Quoting 
from GPL-3.0:


> The “source code” for a work means the preferred form of the work for 
making modifications to it. “Object code” means any non-source form of a 
work.


autotools pre-processed source code is clearly not "the preferred form 
of the work for making modifications", which is specifically what I'm 
saying Debian shouldn't consider a "source code input" either, to 
eliminate this vector for underhanded tampering that Jia Tan has used.


If we can force a future Jia Tan to commit their backdoor into git (for 
everybody to see) I consider this a win.


> The “Corresponding Source” for a work in object code form means all 
the source code needed to generate, install, and (for an executable 
work) run the object code and to modify the work, including scripts to 
control those activities.


The GPL is big on "if you ship object files, the source code for them 
better also be available".


The GPL specifically allows me to have private forks, as long as I'm not 
publicly distributing binaries. If I do distribute binaries, I need to 
also publish the source code I derived them from.


Again: The source code needed to build the binaries.

It does not require me to disclose some version control graph, but I do 
need to provide all source code that goes into the build (which is what 
.orig.tar.xz is supposed to be).


A "source code build process" is clearly just the build process in a 
trenchcoat.


cheers,
kpcyrd



Re: Upstreams with "official" tarballs differing from their git

2025-02-15 Thread kpcyrd

On 2/15/25 12:10 PM, Stéphane Glondu wrote:

I realize my previous email was a bit short: I was wondering if this
.tbz still source code because in the autotools world, package sources
come with configure scripts ready to run, but the good practice in
Debian is to regenerate those from configure.ac.


Well, we enter a philosophical debate that is not specific to OCaml and 
probably should be discussed elsewhere... Adding debian-devel to get 
more opinions.


Summary to other debian-devel readers: we are facing some upstreams that 
publish "official" tarballs that differ from what is in their git. The 
differences may include: variable substitutions, generated files... I 
guess this is pretty common (cf. autotools). Moreover, the build system 
behaves differently if it is called from git or not, or from extracted 
official tarballs or not.


IMHO, traditionnaly, "source code" from Debian point of view is whatever 
upstream releases as "official" tarballs (i.e. elpi-2.0.7.tbz), which 
may differ from what is in upstream git (i.e. v2.0.7.tar.gz). What makes 
me think that is the special care that is taken in keeping upstream 
tarballs pristine (with their signatures...).


[...]


What do you think about the topic?


My e-mail is very opinionated, I would really like to hear other opinions.


hello! ✨

disclaimer upfront, I know pretty much nothing about ocaml, this is 
based on my experience with C/Rust/Go/etc.


I think the concept of "building the source code into source code" [sic] 
that is common with autotools, is just the regular build in a trenchcoat 
and should happen on Debian build servers. This is to avoid forcing a 
gap between the VCS and Reproducible Builds that nobody feels 
responsible for. Coincidentally this topic was also discussed in 
#reproducible-builds irc yesterday.


With regards to signatures, quoting from an email I wrote briefly after 
the XZ incident[0]:


> It's from the old mindset of code signing being the only way of 
securely getting code from upstream. Recent events have shown (instead 
of bothering upstream for signatures) it's much more important to have 
clarity and transparency what's in the code that is compiled into 
binaries and executed on our computers, instead of who we got it from. 
The entire reproducible builds effort is based on the idea of the source 
code in Debian being safe and sound to use.


[0]: https://lists.debian.org/debian-devel/2024/04/msg00125.html

I know Debian attempts to regenerate the autotools files, but there is 
no way to tell if this actually worked, I vaguely remembered XZ was 
specifically one of the cases where it didn't.


In other news, note there's currently a push within Arch Linux to move 
away from upstream custom tarballs towards VCS snapshots:


https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/46

Also because people found this interesting yesterday, Arch Linux and 
Debian disagree on "what's the source code of curl 8.12.1":


Arch Linux: 
https://whatsrc.org/artifact/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130
Debian: 
https://whatsrc.org/artifact/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac


Diff between those two:

https://whatsrc.org/diff-right-trimmed/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac

Even if we got some kind human to review the source code in entirety for 
us, which one should they review?

sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130?
sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac?
Both?

cheers,
kpcyrd