Hi all, In fact, moving the parsing to the client solved the problem!
Thanks! Monique On Thu, Jan 31, 2019 at 8:25 AM Jan Høydahl <jan....@cominvent.com> wrote: > Hi > > This is Apache Tika that cannot parse a zip file or possibly a zip > formatted office file. > You have to post the full stack trace (which you'll find in the solr.log > on server side) > if you want help in locating the source of the issue, you may be able to > configure Tika > > Have you tried to specify ignoreTikaException=true on the request? See > https://lucene.apache.org/solr/guide/7_6/uploading-data-with-solr-cell-using-apache-tika.html > > At the end of the day it would be a much better architecture to parse the > PDFs using plain standalone TikaServer and then construct a Solr Document > in your Python code which is then posted to Solr. Reason is you have much > better control over parse errors and how to map metadata to your schema > fields. Also you don't want to overload Solr with all this work, it can > even crash the whole Solr server if some parser crashes or gets stuck in an > infinite loop. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 30. jan. 2019 kl. 20:49 skrev Monique Monteiro <monique.lou...@gmail.com > >: > > > > Hi all, > > > > I'm writing a Python routine to upload thousands of PDF files to Solr, > and > > after trying to upload some files, Solr reports the following error in a > > HTTP 500 response: > > > > "by: java.util.zip.DataFormatException: invalid distance too far back" > > > > Does anyone have any idea about how to overcome this? > > > > Thanks in advance, > > Monique Monteiro > > -- Monique Monteiro Twitter: http://twitter.com/monilouise