I'm assuming you're using the ExtractingRequestHandler. Offloading
the entire work onto your Solr box that is also serving queries
and indexing is not going to scale well.

Consider using Tika/SolrJ (Tika is what the ERH uses anyway) to
offload the PDF parsing amongst as many clients as you can afford.
Here's a way to get started:

http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick


On Wed, Nov 27, 2013 at 10:00 AM, Marcello Lorenzi <mlore...@sorint.it>wrote:

> Hi All,
> on our test environment we have implemented a new search engine based on
> Solr 4.3 with 2 instances hosted on different servers and 1 shard present
> on each servlet container.
>
> During some stress test we noticed a bottleneck into crawling of large PDF
> file that blocks the serving of results from queries to the collections.
>
> Is it possible to boost or mitigate the overhead created by PDFBOX during
> the crawling?
>
> Thanks,
> Marcello
>

Reply via email to