Shawn,

Thank you for your reply
Other archive message you mentioned is posted by me only
I am new to Solr, When you say process outside Solr program. What exactly I 
should do?

I am having lots of text document which I need to index, what should I apply to 
these document before loading it to Solr?

Regards,
~Sri


-----Original Message-----
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, February 08, 2017 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document 
# 1

On 2/6/2017 3:45 PM, Anatharaman, Srinatha (Contractor) wrote:
> I am having below error while trying to index using dataImporthandler
>
> Data-Config file is mentioned below. zookeeper is not able to read 
> "tikaConfig.xml" on below statement
>
>   processor="TikaEntityProcessor" tikaConfig="tikaConfig.xml"
>
> Please help me to resolve this issue
>
> ion: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable 
> to load Tika Config Processing Document # 1
<snip>
> Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
> ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
> trying to do is not supported in ZooKeeper mode
>         at 
> org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:149)
>         at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91)
>         ... 11 more

This sounds to me like there's something making TikaEntityProcessor 
incompatible with running in SolrCloud mode.  The way that this processor loads 
its config appears to NOT work when the config comes from zookeeper, which it 
always will when you're running SolrCloud.

I don't know if this is expected or not, or whether it will be considered a bug.

It is *strongly* recommended to *not* use the Tika that's embedded within Solr, 
but instead to do the processing outside of Solr in a program of your own and 
index the results.  Tika is very touchy software that sometimes hangs or 
crashes as it processes rich-text documents.  If that happens to the embedded 
Tika, then Solr itself will also be affected.

Doing Tika processing outside of Solr is more important with SolrCloud, because 
all replicas will need to independently index the data in cloud mode.  Here's 
an archive of a message from this list about pretty much the exact same problem:

https://www.mail-archive.com/solr-user@lucene.apache.org/msg127924.html

Note that this message was sent only a week ago.

Thanks,
Shawn


Reply via email to