Re: How is Tika used with Solr

Charlie Hull Wed, 10 Feb 2016 00:54:45 -0800

On 09/02/2016 22:49, Alexandre Rafalovitch wrote:

Solr uses Tika directly. And not in the most efficient way. It is
there mostly for convenience rather than performance.


So, for performance, Solr recommendation is also to run Tika
separately and only send Solr the processed documents.

Absolutely. It's entirely possible to kill Tika with a bad PDF orsomething, bringing down your Solr instance.

Here's something a colleague wrote to wrap Tika in a server, maybe youcan use it:

https://github.com/mattflax/dropwizard-tika-server

Cheers

Charlie


Regards,
     Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 February 2016 at 09:46, Steven White <swhite4...@gmail.com> wrote:

Hi folks,

I'm writing a file-system-crawler that will index files.  The file system
is going to be very busy an I anticipate on average 10 new updates per
min.  My application checks for new or updated files once every 1 min.  I
use Tika to extract the raw-text off those files and send them over to Solr
for indexing.  My application will be running 24x7xN-days.  It will not
recycle unless if the OS is restarted.

Over at Tika mailing list, I was told the following:

"As a side note, if you are handling a bunch of files from the wild in a
production environment, I encourage separating Tika into a separate jvm vs
tying it into any post processing – consider tika-batch and writing
separate text files for each file processed (not so efficient, but
exceedingly robust).  If this is demo code or you know your document set
well enough, you should be good to go with keeping Tika and your
postprocessing steps in the same jvm."

My question is, how does Solr utilize Tika?  Does it run Tika in its own
JVM as an out-of-process application or does it link with Tika JARs
directly?  If it links in directly, are there known issues with Solr
integrated with Tika because of Tika issues?

Thanks

Steve



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: How is Tika used with Solr

Reply via email to