Run Tika in a client instead? Or as a standalone server listening over
TCP socket). Ship only extractions to Solr. This is more efficient as
well.

I suspect, there would always be PDFs that cause strange behaviour,
even if just based on memory requirements (e.g. embedded images). If
that becomes a real issue, move that portion out of the critical path
(Solr Server).

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, May 22, 2014 at 11:24 AM, Brian McDowell <brianmc...@gmail.com> wrote:
> Has anyone had issues with indexing pdf files? Some pdfs are bringing down
> Solr completely so that it actually needs to be manually restarted. We are
> using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the
> problem because the release notes associated with the new tika version and
> also the new pdfbox indicate fixes for pdf issues. It didn't work and now
> this issue is causing us to reevaluate using Solr. Any help on this matter
> would be greatly appreciated. Thank you!

Reply via email to