[
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740587#comment-16740587
]
Karl Wright commented on CONNECTORS-1563:
-----------------------------------------
The metadata extractor can go anywhere in your pipeline, after Tika extraction.
There is absolutely no point in having *two* Tika extractions though -- and
that's what you're trying to do with the setup you've got.
What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and
inject content into Solr using the /update handler, not the /update/extract
handler. There's also a checkbox you'd need to uncheck in the Solr connection
configuration. It's all covered in the ManifoldCF end user documentation.
> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream
> must have > 0 bytes
> -----------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
> Issue Type: Task
> Components: Lucene/SOLR connector
> Reporter: Sneha
> Assignee: Karl Wright
> Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an
> error on Solr i.e. null:org.apache.solr.common.SolrException:
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0
> bytes
> If I ignore tika exception, my documents get indexed but dont have content
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)