RE: [COMPRESS] TIFF file identified as TAR

2018-02-28 Thread Allison, Timothy B.
MM 49 49 2A 00 / 4D 4D 00 2A). -Original Message- From: Stefan Bodewig [mailto:bode...@apache.org] Sent: Tuesday, February 27, 2018 3:46 PM To: Stefan Bodewig Cc: Allison, Timothy B. ; Commons Developers List Subject: Re: [COMPRESS] TIFF file identified as TAR On 2018-02-27, S

[COMPRESS] TIFF file identified as TAR

2018-02-27 Thread Allison, Timothy B.
COMPRESS colleagues, On TIKA-2591[0], a user reports that a specific type of TIFF is being identified as a TAR file. Is this something we should try to fix at the Tika level, or is this something that would be better fixed in COMPRESS? Thank you! Best, Tim [0]

[compress] differences in implementation of Zip ibm vs. oracle?

2017-07-10 Thread Allison, Timothy B.
Compress colleagues, Over on https://bz.apache.org/bugzilla/show_bug.cgi?id=61275, a user submitted two .xlsx files generated with Apache POI, one by IBM's jvm and one by Oracle's jvm. The file generated with Oracle's jvm opens without issue; however, MSOffice complains but can fix the file

[compress] FW: Tika content detection and crawled "remote" content

2017-07-05 Thread Allison, Timothy B.
Fellow file-philes on [compress], Sebastian Nagel has added file type id via Apache Tika to Common Crawl. While Tika is not 100% accurate, this means that we have far better clarity on mime type than relying on the http header+file suffix. So, for testing purposes, you (or we over on Tika)

RE: [COMPRESS] zip-bomb prevention for Z?

2017-04-14 Thread Allison, Timothy B.
>enum wouldn't work for formats added via ServiceLoader. LZO supports a couple >of names of its own and you couldn't inject them into the enum. Doh! Got it. New code base...Sorry. - To unsubscribe, e-mail: dev-unsubscr...@comm

RE: [COMPRESS] zip-bomb prevention for Z?

2017-04-14 Thread Allison, Timothy B.
>> If there is anything COMPRESS can do to detect and avoid the situation, then >> please open an issue over here. Done: COMPRESS-385, PR submitted >> If we wanted to add such a method, what would the return value be? One of >> the String constants contained inside the *Factory classes, likely.

[COMPRESS] zip-bomb prevention for Z?

2017-04-13 Thread Allison, Timothy B.
On TIKA-1631 [1], users have observed that a corrupt Z file can cause an OOM at Internal_.InternalLZWStream.initializeTable. Should we try to protect against this at the Tika level, or should we open an issue on commons-compress's JIRA? A second question, we're creating a stream with the Comp

[COMPRESS and others] FW: Any interest in running Apache Tika as part of CommonCrawl?

2015-04-07 Thread Allison, Timothy B.
here: https://groups.google.com/forum/#!topic/common-crawl/Cv21VRQjGN0 I’ve tried to follow Commons’ vernacular, and I’ve added [COMPRESS] to the Subject line. Please invite others who might have an interest in this work. Best, Tim From: Allison, Timothy B. Sent