Hi, The no-op run completed in 20 minutes. The only commented line was "solr.addBean(doc)" We've tried SUSS as a drop in replacement for CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks of seconds for updates and it continues for a very long time after sending to solr is complete. We thought that it was because we are indexing POJOS as documents. BTW, SOLR-1565 and SOLR-2755 says that SUSS does not support binary payload.
CommonsHttpSolrServer solr = new CommonsHttpSolrServer(url); solr.setRequestWriter(new BinaryRequestWriter()); ... // doc is a solrj annotated POJO solr.addBean(doc) Any thoughts what may be taking too long? Before mapreduce we were indexing in 2-3 hours to localhost using the same code base. On Tue, Sep 27, 2011 at 8:55 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hello, > > By the way, should you need help with Hadoop+Solr, please feel free to get > in touch with us at Sematext (see below) - we happen to work with Hadoop and > Solr on a daily basis and have successfully implemented parallel indexing > into Solr with/from Hadoop. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase > Lucene ecosystem search :: http://search-lucene.com/ > > > ------------------------------ > *From:* Otis Gospodnetic <otis_gospodne...@yahoo.com> > *To:* "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > *Sent:* Tuesday, September 27, 2011 1:37 PM > > *Subject:* Re: SOLR Index Speed > > Hi, > > No need to use reply-all and CC me directly, I'm on the list :) > > It sounds like Solr is not the problem, but the Hadoop side. For example, > what if you change your reducer not to call Solr but do some no-op. Does it > go beyond 500-700 docs/minute? > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase > Lucene ecosystem search :: http://search-lucene.com/ > > > > >________________________________ > >From: Lord Khan Han <khanuniver...@gmail.com> > >To: solr-user@lucene.apache.org; Otis Gospodnetic < > otis_gospodne...@yahoo.com> > >Sent: Tuesday, September 27, 2011 4:42 AM > >Subject: Re: SOLR Index Speed > > > >Our producer (hadoop mapper prepare the docs for submitting and the > reducer > >diriectly submit from solrj http submit..) now 32 reducer but still the > >indexing speed 500 - 700 doc per minute. submission coming from a hadoop > >cluster so submit speed is not a problem. I couldnt use the full solr > index > >machine resources. > > > >I gave 12 gig heap to solr and machine is not swapping. > > > >I couldnt figure out the problem if there is.. > > > >PS: We are committing at the end of the submit. > > > > > >On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han <khanuniver...@gmail.com > >wrote: > > > >> Sorry :) it is not 500 doc per sec. ( It is what i wish I think) It > is > >> 500 doc per MINUTE.. > >> > >> > >> > >> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic < > >> otis_gospodne...@yahoo.com> wrote: > >> > >>> Hello, > >>> > >>> > PS: solr streamindex is not option because we need to submit > javabin... > >>> > >>> > >>> If you are referring to StreamingUpdateSolrServer, then the above > >>> statement makes no sense and you should give SUSS a try. > >>> > >>> Are you sure your 16 reducers produce more than 500 docs/second? > >>> I think somebody already suggested increasing the number of reducers to > >>> ~32. > >>> What happens to your CPU load and indexing speed then? > >>> > >>> > >>> Otis > >>> ---- > >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >>> Lucene ecosystem search :: http://search-lucene.com/ > >>> > >>> > >>> >________________________________ > >>> >From: Lord Khan Han <khanuniver...@gmail.com> > >>> >To: solr-user@lucene.apache.org > >>> >Sent: Monday, September 26, 2011 7:09 AM > >>> >Subject: SOLR Index Speed > >>> > > >>> >Hi, > >>> > > >>> >We have 500K web document and usind solr (trunk) to index it. We have > >>> >special anaylizer which little bit heavy cpu . > >>> >Our machine config: > >>> > > >>> >32 x cpu > >>> >32 gig ram > >>> >SAS HD > >>> > > >>> >We are sending document with 16 reduce client (from hadoop) to the > stand > >>> >alone solr server. the problem is we couldnt get speedier than the 500 > >>> doc / > >>> >per sec. 500K document tooks 7-8 hours to index :( > >>> > > >>> >While indexin the the solr server cpu load is around : 5-6 (32 max) > it > >>> >means %20 of the cpu total power. We have plenty ram ... > >>> > > >>> >I turned of auto commit and give 8198 rambuffer .. there is no io > wait > >>> .. > >>> > > >>> >How can I make it faster ? > >>> > > >>> >PS: solr streamindex is not option because we need to submit > javabin... > >>> > > >>> >thanks.. > >>> > > >>> > > >>> > > >>> > >> > >> > > > > > > > >