On 08/02/2018 11:47, Frederik Van Hoyweghen wrote:
Hey everyone,

What are your experiences on making (in production) use of Solr's
ExtractingRequestHandler?

I've been reading some mixed remarks so I was wondering what your actual
experiences with it are.

Personally, I feel like setting up a separate service which is solely
responsible for parsing file contents (to be indexed by Solr later on in
the process) using Tika is a safer approach, so we can use whatever Tika
version we want along with other things we might want to add.

Yes, do this. It's entirely possible to bring down Tika with a nasty PDF, or end up consuming lots of resources in the extraction step and have these impact your Solr server. Run it separately and you can monitor it/kill it if necessary.

You might like my colleague Matt Pearce's DropWizard wrapper for Tika https://github.com/mattflax/dropwizard-tika-server

Cheers

Charlie

Looking forward to your response!

Kind regards,
Frederik



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to