Bug#499489: pristine-tar: Add LZMA support

Joey Hess Wed, 13 Oct 2010 09:21:15 -0700

Daniel Baumann wrote:
> what is the status of this, now that both lzma and xz are in squeeze and
> supported by dpkg? I'd like to get starting make use of it, and for
> that, i'll need pristine-tar to support it in my git workflow.


Adding support for a new compression format to pristine-tar is in general
a nontrivial problem; the entire input state of the compressor has to be
teased out of the compressed file. Beyond simple stuff like whether
-1 or -9 were used, that state can includes things like the date, or
version of the compressor used, or whether threading was used.

Just comparing the number of options that might affect the output
in gzip with xz should give a good idea of the possible complexity of
doing this for xz. Hopefully many of the more esoteric options (like
compressor filter chains) are not used in producing many files. 

In general, xz being a container format makes it much harder, I think.
Though looking at section 5.3 of the spec
<http://tukaani.org/xz/xz-file-format-1.0.4.txt>, there *is* some
metadata about the input state of the compressors that can be pulled out
of an xz archive. Developing code to do that would be a good first step.

The lack of a large corpus of .lzma or .xz files in the archive
doesn't help implementation; pristine-gz and pristine-bz essentially use the
entire archive as a regression test suite and were developed by finding
ways to reproduce successively larger percentages of files in the
corpus. So I'd consider finding a large corupus of .xz files produced
in the wild to be a good first step also.

-- 
see shy jo

signature.asc
Description: Digital signature

Bug#499489: pristine-tar: Add LZMA support

Reply via email to