On Mon, Jun 10, 2013 at 6:25 AM, Stefan Bodewig <bode...@apache.org> wrote:
> Hi,
>
> when I added support for decompressing .lzma files I left out matches()
> and you can only get an LZMACompressorInputStream from
> CompressorStreamFactory if you use the version that explicitly specifies
> the format.
>
> The reason is that the old .lzma format doesn't have any sort of
> signature at all.  I've been told that if you'd try to "unlzma" a plain
> text file the most likely outcome is an OutOfMemoryError.
>
> The native XZUtil which is used for xz as well as lzma contains some
> heuristic that allows the xz command to guess the input format.  It
> first checks whether the input is xz and if not whether the settings
> that would make up the start of an LZMA stream don't look to strange.
>
> We could do something similar by placing the LZMA check at the end in
> the CompressorStreamFactory's autodetect method and perform the same
> plausibility checks on the input.  This would still run the risk of
> false positives and - maybe less likely - false negatives.  Do we want
> to do something like this?
>
> Stefan

The problem is not unique to LZMA, and since LZMA can contain almost
any bytes at the beginning, it could also be misdetected as another
compression format.

If we can't autodetect all compression formats from the file contents,
then shouldn't we only try to autodetect them from the file extension
or MIME type? Or not do autodetection at all?

Damjan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to