On Mon, Jul 29, 2024, at 10:28 PM, Andres Salomon wrote: > On 7/29/24 16:10, Soren Stoutner wrote: >> On Monday, July 29, 2024 1:18:05 AM MST Andres Salomon wrote: >>> It's unfortunately going to have to wait. We're switching standard >>> libraries, and linking to external libs is a bit rocky right now. >> >> Waiting until things settle is fine. This has been an issue for so long >> that I >> have become a patient man. >> >>> On the plus side, I reduced the time it takes to generate the >>> orig.tar.xz from ~40 minutes to ~5 minutes, which should help a lot with >>> testing the deletion of vendored libraries in the future! >> >> That’s impressive. How did you accomplish that? >> > > Debian's mk-origtargz script (which is what uscan calls) doesn't work > for us, because 'tar --delete' doesn't scale as d/copyright's > Files-Excluded increases (see #995770). > > Mike (prior chromium maintainer) instead patched mk-origtargz to (1) > print out the files that would be deleted, (2) untar the _entire_ > upstream chromium tarball (which at this point is huge at 6.2GB), then > (3) loops over the list of files to delete, deleting them one-by-one and > then (4) packing up the new tarball. It worked okay when chromium's > upstream tarball was roughly 1GB, but it has really ballooned lately. > > I replaced the first three steps with a single 'tar --exclude-from', so > that we save time by not writing deleted files to disk only to manually > delete them: > https://salsa.debian.org/chromium-team/chromium/-/commit/cd5bf2ed6c848ea054718d8f658aa2b38c681d2c > > I would love to get this into mk-origtargz proper so that chromium could > use uscan (and also everyone in debian maintaining larger packages would > benefit), but I'm not even sure where to begin. Maybe as a separate > python mk-origtar tool? Maybe as a patch to mk-origtargz with a > command-line option to fall back to tar --delete? Perhaps d-d has an idea.
FWIW, having this supported in uscan (I don't really care *how* that would be implemented tbh ;)) would be great and save me about an hour or so waiting for repeated repacking every few weeks when updating rustc/cargo. I assume there's a few other packages that do involved pruning of bloated upstream tarballs like that that would also benefit. For Rust, we remove about 2/3 of the upstream tarball[0], both file size and file count wise, but it's nowhere near close to what src:chromium does (or rather, has to do). It's a mix of embedded copies of other projects (e.g., LLVM) that we don't want since we use those provided by standalone packages, and removing toolchain components and their vendored deps that are not used for the Debian build. Technically we could also keep all of that in (or at least, greatly reduce the exclusion list, almost none if it would be undistributable), but it would make both ensuring the build doesn't accidentally pick any of the undesired things up, as well as keeping d/copyright current, a lot more difficult. 0: https://salsa.debian.org/rust-team/rust/-/blob/debian/sid/debian/copyright?ref_type=heads#L4-274