This came up back in September [1] and [2]. Same trigger...crazy number of divs.
I think we could modify the AutoDetectParser to enable configuration of maximum zip-bomb depth via tika-config. If there's any interest in this, re-open TIKA-2091, and I'll take a look. Best, Tim [1] http://git.net/ml/solr-user.lucene.apache.org/2016-09/msg00561.html [2] https://issues.apache.org/jira/browse/TIKA-2091 -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, January 4, 2017 12:20 PM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Zip Bomb Exception in HTML File You might get a more knowledgeable response from the Tika folks, that's really not something Solr controls. Best, Erick On Wed, Jan 4, 2017 at 8:50 AM, <sn0...@ulysses-erp.com> wrote: > i get an exception "<strname="msg">org.apache.tika.exception.TikaException: > Zip bomb detected!</str" > > if i would like to parse a html file - and i think i know why. > because there are many many <div><span> in cascade over 200 divs and > span are inside each. > > Is it correct that there is this limit for html files? > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. >