I suggest you check your solr logs for more info as to the cause. On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:
> Hi Erick, > > No, the PDF file is a testing file which only contains 1 sentence. > > I've managed to get it to work by removing startup="lazy" in > the ExtractingRequestHandler and added the following lines: > <str name="uprefix">ignored_</str> > <str name="captureAttr">true</str> > <str name="fmap.a">links</str> > <str name="fmap.div">ignored_</str> > > Does the presence of startup="lazy" affect the function of > ExtractingRequestHandler , or is it one of the str name values? > > Regards, > Edwin > > > On 18 March 2015 at 23:19, Erick Erickson <erickerick...@gmail.com> wrote: > > > Shot in the dark, but is the PDF file significantly larger than the > > others? Perhaps your simply exceeding the packet limits for the > > servlet container? > > > > Best, > > Erick > > > > On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo > > <edwinye...@gmail.com> wrote: > > > Hi everyone, > > > > > > I'm having some issues with indexing rich-text documents from the Solr > > > Cloud. When I tried to index a pdf or word document, I get the > following > > > error: > > > > > > > > > org.apache.solr.common.SolrException: Bad Request > > > > > > > > > > > > request: > > > http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2 > > > at > > > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) > > > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > > Source) > > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > > Source) > > > at java.lang.Thread.run(Unknown Source) > > > > > > > > > I'm able to index .xml and .csv files in Solr Cloud with the same > > configuration. > > > > > > I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and > > > I have 2 shards with the following details: > > > Shard1: 192.168.2.2:8983 > > > Shard2: 192.168.2.2:8984 > > > > > > Prior to this, I'm already able to index rich-text documents without > > > the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, > > > so my ExtractRequestHandler is already defined. > > > > > > Is there other settings required in order to index rich-text documents > > > in Solr Cloud? > > > > > > > > > Regards, > > > Edwin > > > -- Damien Kamerman