Any idea ? On Thu, Sep 29, 2011 at 1:53 PM, Lord Khan Han <khanuniver...@gmail.com>wrote:
> Hi, > > The no-op run completed in 20 minutes. The only commented line was > "solr.addBean(doc)" We've tried SUSS as a drop in replacement for > CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks of > seconds for updates and it continues for a very long time after sending to > solr is complete. We thought that it was because we are indexing POJOS as > documents. BTW, SOLR-1565 and SOLR-2755 says that SUSS does not support > binary payload. > > > CommonsHttpSolrServer solr = new CommonsHttpSolrServer(url); > > solr.setRequestWriter(new BinaryRequestWriter()); > > ... > > // doc is a solrj annotated POJO > > solr.addBean(doc) > > > Any thoughts what may be taking too long? Before mapreduce we were indexing > in 2-3 hours to localhost using the same code base. > > On Tue, Sep 27, 2011 at 8:55 PM, Otis Gospodnetic < > otis_gospodne...@yahoo.com> wrote: > >> Hello, >> >> By the way, should you need help with Hadoop+Solr, please feel free to get >> in touch with us at Sematext (see below) - we happen to work with Hadoop and >> Solr on a daily basis and have successfully implemented parallel indexing >> into Solr with/from Hadoop. >> >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase >> Lucene ecosystem search :: http://search-lucene.com/ >> >> >> ------------------------------ >> *From:* Otis Gospodnetic <otis_gospodne...@yahoo.com> >> *To:* "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >> *Sent:* Tuesday, September 27, 2011 1:37 PM >> >> *Subject:* Re: SOLR Index Speed >> >> Hi, >> >> No need to use reply-all and CC me directly, I'm on the list :) >> >> It sounds like Solr is not the problem, but the Hadoop side. For example, >> what if you change your reducer not to call Solr but do some no-op. Does it >> go beyond 500-700 docs/minute? >> >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase >> Lucene ecosystem search :: http://search-lucene.com/ >> >> >> >> >________________________________ >> >From: Lord Khan Han <khanuniver...@gmail.com> >> >To: solr-user@lucene.apache.org; Otis Gospodnetic < >> otis_gospodne...@yahoo.com> >> >Sent: Tuesday, September 27, 2011 4:42 AM >> >Subject: Re: SOLR Index Speed >> > >> >Our producer (hadoop mapper prepare the docs for submitting and the >> reducer >> >diriectly submit from solrj http submit..) now 32 reducer but still the >> >indexing speed 500 - 700 doc per minute. submission coming from a hadoop >> >cluster so submit speed is not a problem. I couldnt use the full solr >> index >> >machine resources. >> > >> >I gave 12 gig heap to solr and machine is not swapping. >> > >> >I couldnt figure out the problem if there is.. >> > >> >PS: We are committing at the end of the submit. >> > >> > >> >On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han <khanuniver...@gmail.com >> >wrote: >> > >> >> Sorry :) it is not 500 doc per sec. ( It is what i wish I think) It >> is >> >> 500 doc per MINUTE.. >> >> >> >> >> >> >> >> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic < >> >> otis_gospodne...@yahoo.com> wrote: >> >> >> >>> Hello, >> >>> >> >>> > PS: solr streamindex is not option because we need to submit >> javabin... >> >>> >> >>> >> >>> If you are referring to StreamingUpdateSolrServer, then the above >> >>> statement makes no sense and you should give SUSS a try. >> >>> >> >>> Are you sure your 16 reducers produce more than 500 docs/second? >> >>> I think somebody already suggested increasing the number of reducers >> to >> >>> ~32. >> >>> What happens to your CPU load and indexing speed then? >> >>> >> >>> >> >>> Otis >> >>> ---- >> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> >>> Lucene ecosystem search :: http://search-lucene.com/ >> >>> >> >>> >> >>> >________________________________ >> >>> >From: Lord Khan Han <khanuniver...@gmail.com> >> >>> >To: solr-user@lucene.apache.org >> >>> >Sent: Monday, September 26, 2011 7:09 AM >> >>> >Subject: SOLR Index Speed >> >>> > >> >>> >Hi, >> >>> > >> >>> >We have 500K web document and usind solr (trunk) to index it. We have >> >>> >special anaylizer which little bit heavy cpu . >> >>> >Our machine config: >> >>> > >> >>> >32 x cpu >> >>> >32 gig ram >> >>> >SAS HD >> >>> > >> >>> >We are sending document with 16 reduce client (from hadoop) to the >> stand >> >>> >alone solr server. the problem is we couldnt get speedier than the >> 500 >> >>> doc / >> >>> >per sec. 500K document tooks 7-8 hours to index :( >> >>> > >> >>> >While indexin the the solr server cpu load is around : 5-6 (32 max) >> it >> >>> >means %20 of the cpu total power. We have plenty ram ... >> >>> > >> >>> >I turned of auto commit and give 8198 rambuffer .. there is no io >> wait >> >>> .. >> >>> > >> >>> >How can I make it faster ? >> >>> > >> >>> >PS: solr streamindex is not option because we need to submit >> javabin... >> >>> > >> >>> >thanks.. >> >>> > >> >>> > >> >>> > >> >>> >> >> >> >> >> > >> > >> > >> >> >