Re: SOLR Index Speed

Lord Khan Han Thu, 29 Sep 2011 03:53:55 -0700

Hi,

The no-op run completed in 20 minutes. The only commented line was
"solr.addBean(doc)" We've tried SUSS as a drop in replacement for
CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks of
seconds for updates and it continues for a very long time after sending to
solr is complete. We thought that it was because we are indexing POJOS as
documents. BTW, SOLR-1565 and SOLR-2755 says that SUSS does not support
binary payload.



CommonsHttpSolrServer solr = new CommonsHttpSolrServer(url);

solr.setRequestWriter(new BinaryRequestWriter());

...

// doc is a solrj annotated POJO

solr.addBean(doc)


Any thoughts what may be taking too long? Before mapreduce we were indexing
in 2-3 hours to localhost using the same code base.

On Tue, Sep 27, 2011 at 8:55 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hello,
>
> By the way, should you need help with Hadoop+Solr, please feel free to get
> in touch with us at Sematext (see below) - we happen to work with Hadoop and
> Solr on a daily basis and have successfully implemented parallel indexing
> into Solr with/from Hadoop.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> ------------------------------
> *From:* Otis Gospodnetic <otis_gospodne...@yahoo.com>
> *To:* "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> *Sent:* Tuesday, September 27, 2011 1:37 PM
>
> *Subject:* Re: SOLR Index Speed
>
> Hi,
>
> No need to use reply-all and CC me directly, I'm on the list :)
>
> It sounds like Solr is not the problem, but the Hadoop side.  For example,
> what if you change your reducer not to call Solr but do some no-op.  Does it
> go beyond 500-700 docs/minute?
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> >________________________________
> >From: Lord Khan Han <khanuniver...@gmail.com>
> >To: solr-user@lucene.apache.org; Otis Gospodnetic <
> otis_gospodne...@yahoo.com>
> >Sent: Tuesday, September 27, 2011 4:42 AM
> >Subject: Re: SOLR Index Speed
> >
> >Our producer (hadoop  mapper prepare the docs for submitting and the
> reducer
> >diriectly submit from solrj  http submit..) now 32 reducer but still the
> >indexing speed 500 - 700 doc per minute.  submission coming from a hadoop
> >cluster so submit speed is not a problem.  I couldnt use the full solr
> index
> >machine resources.
> >
> >I gave 12 gig heap to solr and machine is not swapping.
> >
> >I couldnt figure out the problem if there is..
> >
> >PS: We are committing at the end of the submit.
> >
> >
> >On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han <khanuniver...@gmail.com
> >wrote:
> >
> >> Sorry :)  it is not 500 doc per sec.  ( It is what i wish I think)  It
> is
> >> 500 doc per MINUTE..
> >>
> >>
> >>
> >> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic <
> >> otis_gospodne...@yahoo.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> > PS: solr streamindex  is not option because we need to submit
> javabin...
> >>>
> >>>
> >>> If you are referring to StreamingUpdateSolrServer, then the above
> >>> statement makes no sense and you should give SUSS a try.
> >>>
> >>> Are you sure your 16 reducers produce more than 500 docs/second?
> >>> I think somebody already suggested increasing the number of reducers to
> >>> ~32.
> >>> What happens to your CPU load and indexing speed then?
> >>>
> >>>
> >>> Otis
> >>> ----
> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>> Lucene ecosystem search :: http://search-lucene.com/
> >>>
> >>>
> >>> >________________________________
> >>> >From: Lord Khan Han <khanuniver...@gmail.com>
> >>> >To: solr-user@lucene.apache.org
> >>> >Sent: Monday, September 26, 2011 7:09 AM
> >>> >Subject: SOLR Index Speed
> >>> >
> >>> >Hi,
> >>> >
> >>> >We have 500K web document and usind solr (trunk) to index it. We have
> >>> >special anaylizer which little bit heavy cpu .
> >>> >Our machine config:
> >>> >
> >>> >32 x cpu
> >>> >32 gig ram
> >>> >SAS HD
> >>> >
> >>> >We are sending document with 16 reduce client (from hadoop) to the
> stand
> >>> >alone solr server. the problem is we couldnt get speedier than the 500
> >>> doc /
> >>> >per sec. 500K document tooks 7-8 hours to index :(
> >>> >
> >>> >While indexin the the solr server cpu load is around : 5-6  (32 max)
> it
> >>> >means  %20 of the cpu total power. We have plenty ram ...
> >>> >
> >>> >I turned of auto commit  and give 8198 rambuffer .. there is no io
> wait
> >>> ..
> >>> >
> >>> >How can I make it faster ?
> >>> >
> >>> >PS: solr streamindex  is not option because we need to submit
> javabin...
> >>> >
> >>> >thanks..
> >>> >
> >>> >
> >>> >
> >>>
> >>
> >>
> >
> >
> >
>
>

Re: SOLR Index Speed

Reply via email to