Hi This is Apache Tika that cannot parse a zip file or possibly a zip formatted office file. You have to post the full stack trace (which you'll find in the solr.log on server side) if you want help in locating the source of the issue, you may be able to configure Tika
Have you tried to specify ignoreTikaException=true on the request? See https://lucene.apache.org/solr/guide/7_6/uploading-data-with-solr-cell-using-apache-tika.html At the end of the day it would be a much better architecture to parse the PDFs using plain standalone TikaServer and then construct a Solr Document in your Python code which is then posted to Solr. Reason is you have much better control over parse errors and how to map metadata to your schema fields. Also you don't want to overload Solr with all this work, it can even crash the whole Solr server if some parser crashes or gets stuck in an infinite loop. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 30. jan. 2019 kl. 20:49 skrev Monique Monteiro <monique.lou...@gmail.com>: > > Hi all, > > I'm writing a Python routine to upload thousands of PDF files to Solr, and > after trying to upload some files, Solr reports the following error in a > HTTP 500 response: > > "by: java.util.zip.DataFormatException: invalid distance too far back" > > Does anyone have any idea about how to overcome this? > > Thanks in advance, > Monique Monteiro