On 2/6/2017 3:45 PM, Anatharaman, Srinatha (Contractor) wrote: > I am having below error while trying to index using dataImporthandler > > Data-Config file is mentioned below. zookeeper is not able to read > "tikaConfig.xml" on below statement > > processor="TikaEntityProcessor" tikaConfig="tikaConfig.xml" > > Please help me to resolve this issue > > ion: java.lang.RuntimeException: > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load > Tika Config Processing Document # 1 <snip> > Caused by: org.apache.solr.common.cloud.ZooKeeperException: > ZkSolrResourceLoader does not support getConfigDir() - likely, what you are > trying to do is not supported in ZooKeeper mode > at > org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:149) > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91) > ... 11 more
This sounds to me like there's something making TikaEntityProcessor incompatible with running in SolrCloud mode. The way that this processor loads its config appears to NOT work when the config comes from zookeeper, which it always will when you're running SolrCloud. I don't know if this is expected or not, or whether it will be considered a bug. It is *strongly* recommended to *not* use the Tika that's embedded within Solr, but instead to do the processing outside of Solr in a program of your own and index the results. Tika is very touchy software that sometimes hangs or crashes as it processes rich-text documents. If that happens to the embedded Tika, then Solr itself will also be affected. Doing Tika processing outside of Solr is more important with SolrCloud, because all replicas will need to independently index the data in cloud mode. Here's an archive of a message from this list about pretty much the exact same problem: https://www.mail-archive.com/solr-user@lucene.apache.org/msg127924.html Note that this message was sent only a week ago. Thanks, Shawn