On Wed, Oct 13, 2010 at 12:19:17PM -0400, Joey Hess wrote: > Just comparing the number of options that might affect the output > in gzip with xz should give a good idea of the possible complexity of > doing this for xz. Hopefully many of the more esoteric options (like > compressor filter chains) are not used in producing many files. > > In general, xz being a container format makes it much harder, I think. > Though looking at section 5.3 of the spec > <http://tukaani.org/xz/xz-file-format-1.0.4.txt>, there *is* some > metadata about the input state of the compressors that can be pulled out > of an xz archive. Developing code to do that would be a good first step. The problem is that future changes might change the format in ways that can still be read by older xz versions, but are not byte identical with them. This is one thing that makes it hard to have an --rsyncable option in xz [1] and probably also makes it complicated to support for pristine-tar.
> The lack of a large corpus of .lzma or .xz files in the archive > doesn't help implementation; pristine-gz and pristine-bz essentially use the > entire archive as a regression test suite and were developed by finding > ways to reproduce successively larger percentages of files in the > corpus. So I'd consider finding a large corupus of .xz files produced > in the wild to be a good first step also. You'll be able to use tarballs from gnome.org once gnome.org starts with shipping only .xz tarballs [2]. [1] http://sourceforge.net/projects/lzmautils/forums/forum/708858/topic/3272163 [2] http://thread.gmane.org/gmane.comp.gnome.devel.announce/191/focus=45219 -- Julian Andres Klode - Debian Developer, Ubuntu Member See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
pgpLAhfQ5CX2H.pgp
Description: PGP signature