Re: Opinions on ExtractingRequestHandler

Charlie Hull Thu, 08 Feb 2018 07:50:27 -0800

On 08/02/2018 11:47, Frederik Van Hoyweghen wrote:

Hey everyone,


What are your experiences on making (in production) use of Solr's
ExtractingRequestHandler?

I've been reading some mixed remarks so I was wondering what your actual
experiences with it are.

Personally, I feel like setting up a separate service which is solely
responsible for parsing file contents (to be indexed by Solr later on in
the process) using Tika is a safer approach, so we can use whatever Tika
version we want along with other things we might want to add.

Yes, do this. It's entirely possible to bring down Tika with a nastyPDF, or end up consuming lots of resources in the extraction step andhave these impact your Solr server. Run it separately and you canmonitor it/kill it if necessary.

You might like my colleague Matt Pearce's DropWizard wrapper for Tikahttps://github.com/mattflax/dropwizard-tika-server


Cheers

Charlie


Looking forward to your response!

Kind regards,
Frederik



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Opinions on ExtractingRequestHandler

Reply via email to