Hi,

No need to use reply-all and CC me directly, I'm on the list :)

It sounds like Solr is not the problem, but the Hadoop side.  For example, what 
if you change your reducer not to call Solr but do some no-op.  Does it go 
beyond 500-700 docs/minute?

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Lucene ecosystem search :: http://search-lucene.com/



>________________________________
>From: Lord Khan Han <khanuniver...@gmail.com>
>To: solr-user@lucene.apache.org; Otis Gospodnetic <otis_gospodne...@yahoo.com>
>Sent: Tuesday, September 27, 2011 4:42 AM
>Subject: Re: SOLR Index Speed
>
>Our producer (hadoop  mapper prepare the docs for submitting and the reducer
>diriectly submit from solrj  http submit..) now 32 reducer but still the
>indexing speed 500 - 700 doc per minute.  submission coming from a hadoop
>cluster so submit speed is not a problem.  I couldnt use the full solr index
>machine resources.
>
>I gave 12 gig heap to solr and machine is not swapping.
>
>I couldnt figure out the problem if there is..
>
>PS: We are committing at the end of the submit.
>
>
>On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han <khanuniver...@gmail.com>wrote:
>
>> Sorry :)  it is not 500 doc per sec.  ( It is what i wish I think)  It is
>> 500 doc per MINUTE..
>>
>>
>>
>> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic <
>> otis_gospodne...@yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> > PS: solr streamindex  is not option because we need to submit javabin...
>>>
>>>
>>> If you are referring to StreamingUpdateSolrServer, then the above
>>> statement makes no sense and you should give SUSS a try.
>>>
>>> Are you sure your 16 reducers produce more than 500 docs/second?
>>> I think somebody already suggested increasing the number of reducers to
>>> ~32.
>>> What happens to your CPU load and indexing speed then?
>>>
>>>
>>> Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>>
>>>
>>> >________________________________
>>> >From: Lord Khan Han <khanuniver...@gmail.com>
>>> >To: solr-user@lucene.apache.org
>>> >Sent: Monday, September 26, 2011 7:09 AM
>>> >Subject: SOLR Index Speed
>>> >
>>> >Hi,
>>> >
>>> >We have 500K web document and usind solr (trunk) to index it. We have
>>> >special anaylizer which little bit heavy cpu .
>>> >Our machine config:
>>> >
>>> >32 x cpu
>>> >32 gig ram
>>> >SAS HD
>>> >
>>> >We are sending document with 16 reduce client (from hadoop) to the stand
>>> >alone solr server. the problem is we couldnt get speedier than the 500
>>> doc /
>>> >per sec. 500K document tooks 7-8 hours to index :(
>>> >
>>> >While indexin the the solr server cpu load is around : 5-6  (32 max)  it
>>> >means  %20 of the cpu total power. We have plenty ram ...
>>> >
>>> >I turned of auto commit  and give 8198 rambuffer .. there is no io wait
>>> ..
>>> >
>>> >How can I make it faster ?
>>> >
>>> >PS: solr streamindex  is not option because we need to submit javabin...
>>> >
>>> >thanks..
>>> >
>>> >
>>> >
>>>
>>
>>
>
>
>

Reply via email to