Re: Unable to index rich-text documents in Solr Cloud

Zheng Lin Edwin Yeo Wed, 18 Mar 2015 20:45:07 -0700

This is the logs that I got from solr.log. I can't seems to figure out
what's wrong with it. Does anyone knows?




ERROR - 2015-03-18 15:06:51.019;
org.apache.solr.update.StreamingSolrClients$1; error
org.apache.solr.common.SolrException: Bad Request



request:
http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
<http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2>
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
INFO  - 2015-03-18 15:06:51.019;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update/extract params={literal.id
=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf&resource.name=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf}
{add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252
INFO  - 2015-03-18 15:06:51.029;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2015-03-18 15:06:51.029;
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
Skipping IW.commit.
INFO  - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore;
SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update params={waitSearcher=true&distrib.from=
http://192.168.2.2:8983/solr/logmill/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false}
{commit=} 0 10
INFO  - 2015-03-18 15:06:51.039;
org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
path=/update params={commit=true} {commit=} 0 10



Regards,
Edwin


On 19 March 2015 at 10:56, Damien Kamerman <dami...@gmail.com> wrote:

> I suggest you check your solr logs for more info as to the cause.
>
> On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
>
> > Hi Erick,
> >
> > No, the PDF file is a testing file which only contains 1 sentence.
> >
> > I've managed to get it to work by removing startup="lazy" in
> > the ExtractingRequestHandler and added the following lines:
> >       <str name="uprefix">ignored_</str>
> >       <str name="captureAttr">true</str>
> >       <str name="fmap.a">links</str>
> >       <str name="fmap.div">ignored_</str>
> >
> > Does the presence of startup="lazy" affect the function of
> > ExtractingRequestHandler , or is it one of the str name values?
> >
> > Regards,
> > Edwin
> >
> >
> > On 18 March 2015 at 23:19, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > > Shot in the dark, but is the PDF file significantly larger than the
> > > others? Perhaps your simply exceeding the packet limits for the
> > > servlet container?
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
> > > <edwinye...@gmail.com> wrote:
> > > > Hi everyone,
> > > >
> > > > I'm having some issues with indexing rich-text documents from the
> Solr
> > > > Cloud. When I tried to index a pdf or word document, I get the
> > following
> > > > error:
> > > >
> > > >
> > > > org.apache.solr.common.SolrException: Bad Request
> > > >
> > > >
> > > >
> > > > request:
> > >
> >
> http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
> > > >         at
> > >
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
> > > >         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > > Source)
> > > >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > > Source)
> > > >         at java.lang.Thread.run(Unknown Source)
> > > >
> > > >
> > > > I'm able to index .xml and .csv files in Solr Cloud with the same
> > > configuration.
> > > >
> > > > I have setup Solr Cloud using the default zookeeper in Solr 5.0.0,
> and
> > > > I have 2 shards with the following details:
> > > > Shard1: 192.168.2.2:8983
> > > > Shard2: 192.168.2.2:8984
> > > >
> > > > Prior to this, I'm already able to index rich-text documents without
> > > > the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
> > > > so my ExtractRequestHandler is already defined.
> > > >
> > > > Is there other settings required in order to index rich-text
> documents
> > > > in Solr Cloud?
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > >
> >
>
>
>
> --
> Damien Kamerman
>

Re: Unable to index rich-text documents in Solr Cloud

Reply via email to