Run Tika in a client instead? Or as a standalone server listening over TCP socket). Ship only extractions to Solr. This is more efficient as well.
I suspect, there would always be PDFs that cause strange behaviour, even if just based on memory requirements (e.g. embedded images). If that becomes a real issue, move that portion out of the critical path (Solr Server). Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, May 22, 2014 at 11:24 AM, Brian McDowell <brianmc...@gmail.com> wrote: > Has anyone had issues with indexing pdf files? Some pdfs are bringing down > Solr completely so that it actually needs to be manually restarted. We are > using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the > problem because the release notes associated with the new tika version and > also the new pdfbox indicate fixes for pdf issues. It didn't work and now > this issue is causing us to reevaluate using Solr. Any help on this matter > would be greatly appreciated. Thank you!