On 3/1/2017 6:59 PM, Phil Scadden wrote:
> Exceptions never triggered but metadata was essentially empty except
> for contentType, and content was always an empty string. I don’t know
> what parser was doing, but I gave up and with the extractHandler route
> instead which did at least build a full index.

With the extracting request handler, Tika is running inside Solr.  The
handler code is customized for Solr's needs, so it usually works.  When
Tika has one of its well-known issues though, the entire JVM (which
includes Solr) suffers as well.

I do not know how to write Tika code, but this blog post covers an
example program that uses Tika with SolrJ, so the processing is outside
Solr:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

It also uses a database, but it should be relatively easy to remove that.

Thanks,
Shawn

Reply via email to