Re: Any tips for indexing large amounts of data?

sunnyfr Thu, 09 Apr 2009 08:22:05 -0700

Hi Otis,
How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
for 14M docs and 50000 update every 30mn but my replication kill everything. 
My segments are merged too often sor full index replicate and cache lost and
.... I've no idea what can I do now?
Some help would be brilliant,
btw im using Solr 1.4.


Thanks,


Otis Gospodnetic wrote:
> 
> Mike is right about the occasional slow-down, which appears as a pause and
> is due to large Lucene index segment merging.  This should go away with
> newer versions of Lucene where this is happening in the background.
> 
> That said, we just indexed about 20MM documents on a single 8-core machine
> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
> a little less than 10 hours - that's over 550 docs/second.  The vanilla
> approach before some of our changes apparently required several days to
> index the same amount of data.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: Mike Klaas <mike.kl...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, November 19, 2007 5:50:19 PM
> Subject: Re: Any tips for indexing large amounts of data?
> 
> There should be some slowdown in larger indices as occasionally large  
> segment merge operations must occur.  However, this shouldn't really  
> affect overall speed too much.
> 
> You haven't really given us enough data to tell you anything useful.   
> I would recommend trying to do the indexing via a webapp to eliminate  
> all your code as a possible factor.  Then, look for signs to what is  
> happening when indexing slows.  For instance, is Solr high in cpu, is  
> the computer thrashing, etc?
> 
> -Mike
> 
> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
> 
>> Hi,
>>
>> Thanks for answering this question a while back. I have made some  
>> of the suggestions you mentioned. ie not committing until I've  
>> finished indexing. What I am seeing though, is as the index get  
>> larger (around 1Gb), indexing is taking a lot longer. In fact it  
>> slows down to a crawl. Have you got any pointers as to what I might  
>> be doing wrong?
>>
>> Also, I was looking at using MultiCore solr. Could this help in  
>> some way?
>>
>> Thank you
>> Brendan
>>
>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>
>>>
>>> : I would think you would see better performance by allowing auto  
>>> commit
>>> : to handle the commit size instead of reopening the connection  
>>> all the
>>> : time.
>>>
>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>  just
>>> index everything, and don't commit until you are completely done.
>>>
>>> autoCommitting will slow your indexing down (the benefit being  
>>> that more
>>> results will be visible to searchers as you proceed)
>>>
>>>
>>>
>>>
>>> -Hoss
>>>
>>
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Any tips for indexing large amounts of data?

Reply via email to