On Wed, Oct 13, 2010 at 12:19:17PM -0400, Joey Hess wrote:
> Just comparing the number of options that might affect the output
> in gzip with xz should give a good idea of the possible complexity of
> doing this for xz. Hopefully many of the more esoteric options (like
> compressor filter chains) are not used in producing many files. 
> 
> In general, xz being a container format makes it much harder, I think.
> Though looking at section 5.3 of the spec
> <http://tukaani.org/xz/xz-file-format-1.0.4.txt>, there *is* some
> metadata about the input state of the compressors that can be pulled out
> of an xz archive. Developing code to do that would be a good first step.
The problem is that future changes might change the format in ways that
can still be read by older xz versions, but are not byte identical with
them. This is one thing that makes it hard to have an --rsyncable option
in xz [1] and probably also makes it complicated to support for
pristine-tar.

> The lack of a large corpus of .lzma or .xz files in the archive
> doesn't help implementation; pristine-gz and pristine-bz essentially use the
> entire archive as a regression test suite and were developed by finding
> ways to reproduce successively larger percentages of files in the
> corpus. So I'd consider finding a large corupus of .xz files produced
> in the wild to be a good first step also.
You'll be able to use tarballs from gnome.org once gnome.org starts with
shipping only .xz tarballs [2].

[1] http://sourceforge.net/projects/lzmautils/forums/forum/708858/topic/3272163
[2] http://thread.gmane.org/gmane.comp.gnome.devel.announce/191/focus=45219
-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.

Attachment: pgpLAhfQ5CX2H.pgp
Description: PGP signature

Reply via email to