Hi all,

In fact, moving the parsing to the client solved the problem!

Thanks!
Monique

On Thu, Jan 31, 2019 at 8:25 AM Jan Høydahl <jan....@cominvent.com> wrote:

> Hi
>
> This is Apache Tika that cannot parse a zip file or possibly a zip
> formatted office file.
> You have to post the full stack trace (which you'll find in the solr.log
> on server side)
> if you want help in locating the source of the issue, you may be able to
> configure Tika
>
> Have you tried to specify ignoreTikaException=true on the request? See
> https://lucene.apache.org/solr/guide/7_6/uploading-data-with-solr-cell-using-apache-tika.html
>
> At the end of the day it would be a much better architecture to parse the
> PDFs using plain standalone TikaServer and then construct a Solr Document
> in your Python code which is then posted to Solr. Reason is you have much
> better control over parse errors and how to map metadata to your schema
> fields. Also you don't want to overload Solr with all this work, it can
> even crash the whole Solr server if some parser crashes or gets stuck in an
> infinite loop.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 30. jan. 2019 kl. 20:49 skrev Monique Monteiro <monique.lou...@gmail.com
> >:
> >
> > Hi all,
> >
> > I'm writing a Python routine to upload thousands of PDF files to Solr,
> and
> > after trying to upload some files, Solr reports the following error in a
> > HTTP 500 response:
> >
> > "by: java.util.zip.DataFormatException: invalid distance too far back"
> >
> > Does anyone have any idea about how to overcome this?
> >
> > Thanks in advance,
> > Monique Monteiro
>
>

-- 
Monique Monteiro
Twitter: http://twitter.com/monilouise

Reply via email to