Hi all, I have been having an issue with Solr, using the ExtractingRequestHandler. Basically, when indexing a PDF (for example) I get all the metadata mixed into the "content" field along with the content. See: <https://stackoverflow.com/questions/47934257/importing-files-with-solr-cell-tika-is-mixing-metadata-fields-with-content> for the gory details.
I'm guessing this is the same basic issue as <https://issues.apache.org/jira/browse/SOLR-9178> which is still unresolved. But I thought I'd ping the list just to see if anyone had a workaround or any more information on this. Is there any way to get reasonable behavior using the ExtractingRequestHandler, or should I just dump that approach and plan to run Tika outside of Solr, and then send Solr the exact content I want? Thanks, This message optimized for indexing by NSA PRISM