Anthony Towns wrote: > On Tue, Oct 02, 2007 at 04:15:44PM -0400, Joey Hess wrote: >> BTW, the next release of pristine-tar will support generating pristine >> gz files too, so will fully support pristine .orig.tar.gz. Regenerating >> pristine gz files from small deltas is quite a lot trickier, and >> currently works for about 99% of the .orig.tar.gz files in the Debian >> archive. Many thanks to paravoid for making it happen.. > > Oh wow, that's cool. Any chance of a post/blog on how that was achieved? There are mostly two kinds of gzip, both compatible with each other: a) GNU gzip, which are relatively easy; they can have: * the name of the original file (optional) * the timestamp of creation * a compression level ("normal", --fast, --best) One can easily figure out these from the gzip headers and recreate them passing the according gzip options (-n and the undocumented -m and -M).
There's also --rsyncable which is appears mostly (if not only) on Debian and unfortunately can't be figured out from the headers. GNU gzip is the vast majority of the archive. b) zlib's gzip; the BSDs use a CLI-compatible gzip based on zlib and most of the files in this category come from there. zlib obviously results in a different content on all compression levels because of a different algorithm. Apart from that, since it's a library that many can easily use, there are some really strange gzips out there; many have full or relative paths in the original name field while others have a --best compression level without indicating so in the headers (zlib doesn't write the headers for you, unfortunately). Some implementations also have a modified Operating System flag in the gzip headers For this, I ported NetBSD's gzip and heavily modified it so that it can take "expert" arguments so that you can set e.g. the OS flag or various quirks. Unfortunately, it's not easy to separate the two implementations or the quirks and pristine-gz tries to create all of them until it succeeds. It's trying to be smart (e.g. by not using GNU gzip if the osflag is not Unix or if the original name contains slashes) but recognizing a gzip may take some time. Something that doesn't work at the moment -and I'd be grateful for any help- is the majority of MS-DOS and Win32/NTFS implementatations. Multipart gzips would also not work even though I haven't yet find any. On the first run of the tool on the whole archive, pristine-gz succeded in recognizing 21869 of 22566 orig.tar.gz (almost 97% of the archive). It explicitelly failed on 206 of them (0.91%) while something weird, probably a bug, happened on the rest 491 (2.18%). joeyh is doing another run on the archive with updated versions of both pristine-tar and pristine-gz, we'll have more of these nice statistics soon. Regards, Faidon -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]