Re: Transparency into private keys of Debian
On 2/1/24 10:38, Simon Josefsson wrote: Hi I'm exploring how to defend against an attacker who can create valid signatures for cryptographic private keys (e.g., PGP) that users need to trust when using an operating system such as Debian. A signature like that can be used in a targetted attacks against one victim. For example, apt does not have any protection against this threat scenario, and often unprotected ftp:// or http:// transports are used, which are easy to man-in-the-middle. Even with https:// there is a large number of mirror sites out there that can replace content you get. There is a risk that use of a compromised trusted apt PGP key will not be noticed. Attackers are also in a good position to deny themselves out of their actions if they are careful. hello, thanks for raising this. I've spent some time exploring this in 2022 and figured, though I don't have the resources to do anything meaningful about this, I can still help by making reasoning about this attack easier by implementing it. I authored a malicious update server for multiple package managers including apt (the contrib/ folder has a few examples, some of them being incomplete or defunct tho, please do not make assumptions unless you tested them, many of them have a `check:` section for automatic integration testing): https://github.com/kpcyrd/sh4d0wup The name is derived from "shadow updates" that carry a valid signature (through private key abuse) but are fed directly to an update client and therefore never show up in Debian. It has some features to illustrate how I would assume targeted attacks to work (switch the Release/InRelease files based on source IP address and/or user-agent, leading them to a different /by-hash/ url that nobody else knows about). From what I've learned the most interesting keys are (for each respective release): - Debian Archive Automatic Signing Key - Debian Security Archive Automatic Signing Key - Debian Stable Release Key The DM/DD keys probably also have some value, but would leave a permanent record in Debian which might get noticed at some (distant or not so distant) point in the future, however when abusing release signing keys you are almost guaranteed to get away with it. --- Having built all of this (and also hence having proven writing the software for this is well within reach of a single, semi-determined rando), I then spent some time in 2023 trying to think up a system that is as annoying as possible for sh4d0wup users and developed apt-swarm. It's my first attempt of an autonomous p2p network and it uses the configured public-keys for a "proof of authority", gossip-sync protocol to attempt an eventually-consistent view of all Release/InRelease files. However: - I need an embedded database with specific properties for the sync protocol (allowing fast, ordered lookups by key-prefix), I picked the `sled` library as suggested by a debian-rust friend at FOSDEM 2023, but I later learned sled needs to be able to fit the entire database into RAM, so running a node currently requires ~8GB RAM (and *will* require more in the future). - Because of this, the entire p2p network is currently running from a single server (that is provided to me by a friend from a local hackerspace). You can join your own nodes at any time but so far nobody has done so. - Despite it's name, it doesn't know about apt and only stores opaque signed documents, the Release/InRelease files. It does not store any copies of any referenced package indexes (or packages), which limits the amount of data you get to triage incidents with. The last part is important because availability of this kind of data is going to set the scene in case you ever need to actually investigate something. I found that rekor from the sigstore project is a great public resource, but will only tell you "there's a signature you don't have any records of", but nowhere to go from there. It doesn't complement the snapshot service and it's impossible to tell apart if somebody generated themselves the inclusion-proof for a malicious update or if snapshot.debian.org simply missed/lost a snapshot. With apt-swarm you can look at the alleged `Date:` field of the release file but you will (likely) still be missing the package index that was fed into user's apt clients (and the code they executed) unless you recognize the referenced hashes. Development of this project has stalled due to the mentioned RAM usage issue, little interest for this kind of tech (even inside the security community) and cyberpunk 2077 being a really good game. Throughout the year apt-swarm has collected (among others): - 20,459 signatures from `Debian Archive Automatic Signing Key (10/buster)` - 18,085 signatures from `Debian Archive Automatic Signing Key (9/stretch)` - 10,463 signatures from `Debian Archive Automatic Signing
New supply-chain security tool: backseat-signed
Hello, I'm going to keep this short, I've been writing a lot of text recently (which is quite exhausting, on top of my dayjob and all the code I wrote today afterwards. Apologies if you're still waiting for a reply in one of the other threads). I figured out a somewhat straight-forward way to check if a given `git archive` output is cryptographically claimed to be the source input of a given binary package in either Arch Linux or Debian (or both). I believe this to be the "reproducible source tarball" thing some people have been asking about. As explained in the README, I believe reproducing autotools-generated tarballs isn't worth everybody's time and instead a distribution that claims to build from source should operate on VCS snapshots instead of tarballs with 25k lines of pre-generated shell-script. Building from VCS snapshots is already the case for a large number of Arch Linux packages (through auto-generated Github tarballs). Some packages have been actively converted to VCS snapshots by Arch Linux staff in response to the xz incident. This tool highlights the concept of "canonical sources", which is supposed to give guidance on what to code review. This is also why I think code signing by upstream is somewhat low priority, since the big distros can form consensus around "what's the source code" regardless. https://github.com/kpcyrd/backseat-signed The README shows how to verify Arch Linux and Debian build cmatrix from the same source code - they may both still apply patches (which would be considered part of the build instructions), but the specified source input is the same. This tarball can also be bit-for-bit reproduced from VCS by taking a `git archive` snapshot of the v2.0 tag in the cmatrix repository. (If somebody ever tells you programming in Rust is slower, I wrote the entirety of this codebase within a few hours of a single day) Let me know what you think. 🖤 Happy feet, kpcyrd
Re: New supply-chain security tool: backseat-signed
On 4/3/24 4:21 AM, Adrian Bunk wrote: On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote: ... I figured out a somewhat straight-forward way to check if a given `git archive` output is cryptographically claimed to be the source input of a given binary package in either Arch Linux or Debian (or both). For Debian the proper approach would be to copy Checksums-Sha256 for the source package to the buildinfo file, and there is nothing where it would matter whether the tarball was generated from git or otherwise. I believe this to be the "reproducible source tarball" thing some people have been asking about. ... The lack of a reliably reproducible checksum when using "git archive" is the problem, and git cannot realistically provide that. Even when called with the same parameters, "git archive" executed in different environments might produce different archives for the same commit ID. It is documented that auto-generated Github tarballs for the same tag and with the same commit ID downloaded at different times might have different checksums. Granted it takes some skill to take snapshots that match what github is generating (and there are occasional issues) but generally speaking it works quite well. The required command is in the README, and I encourage you to give it a try. If you want something that's explicitly designed for taking reproducible VCS snapshots you could also consider the "Nix Archive" format[0], however I think more people would be in favor of agreeing on how to canonically derive a given git tree into a `.tar.gz` (or at least .tar) instead of switching Debian to the .nar file format. [0]: https://github.com/ebkalderon/libnar I think regular `git archive` is already pretty good, complaining that it may only work in 98% of cases, I'd say, is a Luxusproblem considering the current state of things. The next paragraph is the bigger headache: This tool highlights the concept of "canonical sources", which is supposed to give guidance on what to code review. ... How does it tell the git commit ID the tarball was generated from? Doing a code review of git sources as tarball would would be stupid, you really want the git metadata that usually shows when, why and by whom something was changed. It doesn't. It works like a one-way function, it can verify a given VCS snapshot is definitely the source code that was ingested into Debian, but it can't locate the source code on its own. I don't know if Debian has this kind of provenance information available, to my knowledge, Debian operates on "our maintainers upload .tar.xz files into our archive and we take them for face value". Which does make sense, considering not every software project uses git, some may develop their own VCS, some software projects do not have any VCS at all and it's just one person applying patches to a folder on their local computer and uploading .tar snapshots to a webserver every other month. There's some packages that have some kind of system behind them, like rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to match <https://crates.io/api/v1/crates/toml/0.5.11/download> (although sometimes files get excluded from the tar upload). I'd like to explicitly encourage people to point me in the right direction if there's any existing effort of mapping debian .orig.tar.gz files to git tags (not necessarily bit-for-bit, but at least which commit we expect it to come from). https://github.com/kpcyrd/backseat-signed The README ... "This requires some squinting since in Debian the source tarball is commonly recompressed so only the inner .tar is compared" This doesn't sound true. I've updated the wording and intend to investigate this further. By default the relevant command even expects an exact match. For example this works: ``` % backseat-signed plumbing debian-tarball-from-sources --sources Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz [2024-04-04T18:45:09Z INFO backseat_signed::plumbing] Loading sources index from "Sources.xz" [2024-04-04T18:45:10Z INFO backseat_signed::plumbing] Loading file from "cmatrix_2.0.orig.tar.gz" [2024-04-04T18:45:10Z INFO backseat_signed::plumbing] Searching in index... [2024-04-04T18:45:10Z INFO backseat_signed::plumbing] File verified successfully ``` But if I repack the .tar.gz into .tar.xz it's going to get rejected: ``` % backseat-signed plumbing debian-tarball-from-sources --sources Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz [2024-04-04T18:48:32Z INFO backseat_signed::plumbing] Loading sources index from "Sources.xz" [2024-04-04T18:48:33Z INFO backseat_signed::plumbing] Loading file from "cmatrix_2.0.orig.tar.xz" [2024-04-04T18:48:33Z INFO backseat_signed::plumbing] Searching in index... Error: Could not find sou
Re: New supply-chain security tool: backseat-signed
On 4/5/24 12:31 AM, Adrian Bunk wrote: Hashes of "git archive" tarballs are anyway not stable, so whatever a maintainer generates is not worse than what is on Github. Any proper tooling would have to verify that the contents is equal. ... Being able to disregard the compression layer is still necessary however, because Debian (as far as I know) never takes the hash of the inner .tar file but only the compressed one. Because of this you may still need to provide `--orig ` if you want to compare with an uncompressed tar. ... Right now the preferred form of source in Debian is an upstream-signed release tarball, NOT anything from git. An actual improvement would be to automatically and 100% reliably verify that a given tarball matches the commit ID and signed git tag in an upstream git tree. I strongly disagree. I think the upstream signature is overrated. It's from the old mindset of code signing being the only way of securely getting code from upstream. Recent events have shown (instead of bothering upstream for signatures) it's much more important to have clarity and transparency what's in the code that is compiled into binaries and executed on our computers, instead of who we got it from. The entire reproducible builds effort is based on the idea of the source code in Debian being safe and sound to use. If upstream refused to sign anything but pre-compiled llvm IR, I'd put both the IR and signature in the trash and build from source code. If upstream wouldn't sign anything but autotools pre-processed archives with 25k lines of auto-generated shell scripts I'd put it next to the IR and build from the actual source code as well. If upstream would only sign a tarball with files sorted in the order they were returned by their kernel to readdir(), I'd raise the question why we're having this in 2024 (and possibly suggest to use a tar with sorted entries). Although to be honest if this would really be the only problem we'd be having, I'd likely not care anymore and put my time to better use. Or perhaps stop using tarballs in Debian as sole permitted form of source. I'd be fine with that. cheers, kpcyrd
Re: New supply-chain security tool: backseat-signed
On 4/6/24 1:42 PM, Adrian Bunk wrote: You cannot simply proclaim that some git tree is the preferred form of modification without shipping said git tree in our ftp archive. If your claim was true, then Debian and downstreams would be violating licences like the GPL by not providing the preferred form of modification in the archive. I'm obviously not a lawyer, but I do think this is the case. Quoting from GPL-3.0: > The “source code” for a work means the preferred form of the work for making modifications to it. “Object code” means any non-source form of a work. autotools pre-processed source code is clearly not "the preferred form of the work for making modifications", which is specifically what I'm saying Debian shouldn't consider a "source code input" either, to eliminate this vector for underhanded tampering that Jia Tan has used. If we can force a future Jia Tan to commit their backdoor into git (for everybody to see) I consider this a win. > The “Corresponding Source” for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. The GPL is big on "if you ship object files, the source code for them better also be available". The GPL specifically allows me to have private forks, as long as I'm not publicly distributing binaries. If I do distribute binaries, I need to also publish the source code I derived them from. Again: The source code needed to build the binaries. It does not require me to disclose some version control graph, but I do need to provide all source code that goes into the build (which is what .orig.tar.xz is supposed to be). A "source code build process" is clearly just the build process in a trenchcoat. cheers, kpcyrd
Re: Upstreams with "official" tarballs differing from their git
On 2/15/25 12:10 PM, Stéphane Glondu wrote: I realize my previous email was a bit short: I was wondering if this .tbz still source code because in the autotools world, package sources come with configure scripts ready to run, but the good practice in Debian is to regenerate those from configure.ac. Well, we enter a philosophical debate that is not specific to OCaml and probably should be discussed elsewhere... Adding debian-devel to get more opinions. Summary to other debian-devel readers: we are facing some upstreams that publish "official" tarballs that differ from what is in their git. The differences may include: variable substitutions, generated files... I guess this is pretty common (cf. autotools). Moreover, the build system behaves differently if it is called from git or not, or from extracted official tarballs or not. IMHO, traditionnaly, "source code" from Debian point of view is whatever upstream releases as "official" tarballs (i.e. elpi-2.0.7.tbz), which may differ from what is in upstream git (i.e. v2.0.7.tar.gz). What makes me think that is the special care that is taken in keeping upstream tarballs pristine (with their signatures...). [...] What do you think about the topic? My e-mail is very opinionated, I would really like to hear other opinions. hello! ✨ disclaimer upfront, I know pretty much nothing about ocaml, this is based on my experience with C/Rust/Go/etc. I think the concept of "building the source code into source code" [sic] that is common with autotools, is just the regular build in a trenchcoat and should happen on Debian build servers. This is to avoid forcing a gap between the VCS and Reproducible Builds that nobody feels responsible for. Coincidentally this topic was also discussed in #reproducible-builds irc yesterday. With regards to signatures, quoting from an email I wrote briefly after the XZ incident[0]: > It's from the old mindset of code signing being the only way of securely getting code from upstream. Recent events have shown (instead of bothering upstream for signatures) it's much more important to have clarity and transparency what's in the code that is compiled into binaries and executed on our computers, instead of who we got it from. The entire reproducible builds effort is based on the idea of the source code in Debian being safe and sound to use. [0]: https://lists.debian.org/debian-devel/2024/04/msg00125.html I know Debian attempts to regenerate the autotools files, but there is no way to tell if this actually worked, I vaguely remembered XZ was specifically one of the cases where it didn't. In other news, note there's currently a push within Arch Linux to move away from upstream custom tarballs towards VCS snapshots: https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/46 Also because people found this interesting yesterday, Arch Linux and Debian disagree on "what's the source code of curl 8.12.1": Arch Linux: https://whatsrc.org/artifact/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130 Debian: https://whatsrc.org/artifact/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac Diff between those two: https://whatsrc.org/diff-right-trimmed/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac Even if we got some kind human to review the source code in entirety for us, which one should they review? sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130? sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac? Both? cheers, kpcyrd