Anthony Towns <aj@azure.humbug.org.au> writes: > Hrm, thinking about it, I guess zsync probably works by storing the > state of the gzip table at certain points in the file and doing a > rolling hash of the contents and recompressing each chunk of the file; > that'd result in the size of the .gz not necessarily being the same, let > alone the md5sum.
zsync has to recompress the raw data locally and for that it has to guess at the implementation used to compress the initial file. But for debs that should be deterministic. zsync can garanty that recompressing gives the same result by checking that is does when creating the checksum files. If the input file and zsync's recompression result in the same then it will always be the same unless zsync changes its gzip implementation. > Feh, trying to verify this with ~512kB of random data, gzipped, I just > keep getting "Aborting, download available in zsyncnew.gz.part". That's > not terribly reassuring. And trying it with gzipped text data, I get > stuck on 99.0%, with zsync repeatedly requesting around 700 bytes. > > Anyway, if it's recompressing like I think, there's no way to get the > same compressed md5sum -- even if the information could be transferred, > there's no guarantee the local gzip _can_ produce the same output as > the remote gzip -- imagine if it had used gzip -9 and your local gzip > only supports -1 through -5, eg. zsync doesn't fork of some unknown local gzip and it knows what its own gzip routines can produce. It can easily be guaranteed that the zsync client behaves the same way as the remote zsync checksum program that would test for recompressability. The failure to sync the file is definetly a bug in zsync. Even if the recompression fails (which it should know beforehand) it should fall back to syncing the compressed data and produce the expected result. > Hrm, it probably also means that mirrors can't use zsync -- that is, > if you zsync fooA to fooB you probably can't use fooA.zsync to zsync > from fooB to fooC. > > Anyway, just because you get a different file, that doesn't mean it'll > act differently; so we could just use an "authentication" mechanism > that reflects that. That might involve providing sizes and sha1s of the > uncompressed contents of the ar in the packages file, instead of the > md5sum of the ar. Except the previous note probably means that you'd > still need to use the md5sum of the .deb to verify mirrors; which means > mirrors and users would have different ways of verifying their > downloads, which is probably fairly undesirable. Too bad Packages files contain the md5sum of the full deb. Changing that would be a ugly and lengthy process. So lets not do that. The only sane way is to make zsync produce identical debs. It isn't trivial but not impossible. > Relatedly, mirrors (and apt-proxy users, etc) need to provide Packages.gz > of a particular md5sum/size, so they can't use Packages.diff to speed > up their diffs. It might be worth considering changing the Release file > definition to just authenticate the uncompressed files and expect tools > like apt and debootstrap to authenticate only after uncompressing. A > "Compression-Methods: gz, bz2" header might suffice to help tools work > out whether to try downloading Packages.gz, Packages.bz2 or just plain > Packages first. Possibly "Packages-Compress:" and "Sources-Compress:" > might be better. > > Cheers, > aj % gunzip <Packages.gz | gzip -9 >Packages.gz.2 % gunzip <Packages.gz | gzip -9 >Packages.gz.3 % gunzip <Packages.gz | gzip -9 -n >Packages.gz.4 % gunzip <Packages.gz | gzip -9 -n >Packages.gz.5 % md5sum * 172930d0165cf3f7b23324ec79e52847 Packages.gz be00244619e0ed53ae2ba5a454aa3fee Packages.gz.2 d4c7c8e04d963beb4d3bee4ac8e7bd0f Packages.gz.3 764c5aa8168cb58d5e4d6412333516a5 Packages.gz.4 764c5aa8168cb58d5e4d6412333516a5 Packages.gz.5 The problem is the timestamp in gzip files. If you patch the DAK to use the -n switch then Packages.diff can be used to update Packages and then recompress it. Further zsync could include the timestamp in the .zsync file and recompress to the same timestamp. MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]