Hi all, I have been having an issue with Solr, using the
ExtractingRequestHandler.  Basically, when indexing a PDF (for
example) I get all the metadata mixed into the "content" field along
with the content.  See:
<https://stackoverflow.com/questions/47934257/importing-files-with-solr-cell-tika-is-mixing-metadata-fields-with-content>
for the gory details.

I'm guessing this is the same basic issue as
<https://issues.apache.org/jira/browse/SOLR-9178> which is still
unresolved.  But I thought I'd ping the list just to see if anyone had
a workaround or any more information on this.

Is there any way to get reasonable behavior using the
ExtractingRequestHandler, or should I just dump that approach and plan
to run Tika outside of Solr, and then send Solr the exact content I
want?


Thanks,



This message optimized for indexing by NSA PRISM

Reply via email to