Hi, No need to use reply-all and CC me directly, I'm on the list :)
It sounds like Solr is not the problem, but the Hadoop side. For example, what if you change your reducer not to call Solr but do some no-op. Does it go beyond 500-700 docs/minute? Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase Lucene ecosystem search :: http://search-lucene.com/ >________________________________ >From: Lord Khan Han <khanuniver...@gmail.com> >To: solr-user@lucene.apache.org; Otis Gospodnetic <otis_gospodne...@yahoo.com> >Sent: Tuesday, September 27, 2011 4:42 AM >Subject: Re: SOLR Index Speed > >Our producer (hadoop mapper prepare the docs for submitting and the reducer >diriectly submit from solrj http submit..) now 32 reducer but still the >indexing speed 500 - 700 doc per minute. submission coming from a hadoop >cluster so submit speed is not a problem. I couldnt use the full solr index >machine resources. > >I gave 12 gig heap to solr and machine is not swapping. > >I couldnt figure out the problem if there is.. > >PS: We are committing at the end of the submit. > > >On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han <khanuniver...@gmail.com>wrote: > >> Sorry :) it is not 500 doc per sec. ( It is what i wish I think) It is >> 500 doc per MINUTE.. >> >> >> >> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic < >> otis_gospodne...@yahoo.com> wrote: >> >>> Hello, >>> >>> > PS: solr streamindex is not option because we need to submit javabin... >>> >>> >>> If you are referring to StreamingUpdateSolrServer, then the above >>> statement makes no sense and you should give SUSS a try. >>> >>> Are you sure your 16 reducers produce more than 500 docs/second? >>> I think somebody already suggested increasing the number of reducers to >>> ~32. >>> What happens to your CPU load and indexing speed then? >>> >>> >>> Otis >>> ---- >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>> Lucene ecosystem search :: http://search-lucene.com/ >>> >>> >>> >________________________________ >>> >From: Lord Khan Han <khanuniver...@gmail.com> >>> >To: solr-user@lucene.apache.org >>> >Sent: Monday, September 26, 2011 7:09 AM >>> >Subject: SOLR Index Speed >>> > >>> >Hi, >>> > >>> >We have 500K web document and usind solr (trunk) to index it. We have >>> >special anaylizer which little bit heavy cpu . >>> >Our machine config: >>> > >>> >32 x cpu >>> >32 gig ram >>> >SAS HD >>> > >>> >We are sending document with 16 reduce client (from hadoop) to the stand >>> >alone solr server. the problem is we couldnt get speedier than the 500 >>> doc / >>> >per sec. 500K document tooks 7-8 hours to index :( >>> > >>> >While indexin the the solr server cpu load is around : 5-6 (32 max) it >>> >means %20 of the cpu total power. We have plenty ram ... >>> > >>> >I turned of auto commit and give 8198 rambuffer .. there is no io wait >>> .. >>> > >>> >How can I make it faster ? >>> > >>> >PS: solr streamindex is not option because we need to submit javabin... >>> > >>> >thanks.. >>> > >>> > >>> > >>> >> >> > > >