On 3/1/2017 6:59 PM, Phil Scadden wrote: > Exceptions never triggered but metadata was essentially empty except > for contentType, and content was always an empty string. I don’t know > what parser was doing, but I gave up and with the extractHandler route > instead which did at least build a full index.
With the extracting request handler, Tika is running inside Solr. The handler code is customized for Solr's needs, so it usually works. When Tika has one of its well-known issues though, the entire JVM (which includes Solr) suffers as well. I do not know how to write Tika code, but this blog post covers an example program that uses Tika with SolrJ, so the processing is outside Solr: https://lucidworks.com/2012/02/14/indexing-with-solrj/ It also uses a database, but it should be relatively easy to remove that. Thanks, Shawn