Re: Any tips for indexing large amounts of data?

sunnyfr Fri, 10 Apr 2009 03:38:17 -0700

ok but how people do for a frequent update for a large dabase and lot of
query on it ?
do they turn off the slave during the warmup ??



Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr <johanna...@gmail.com> wrote:
>>
>> Hi Otis,
>> How did you manage that? I've 8 core machine with 8GB of ram and 11GB
>> index
>> for 14M docs and 50000 update every 30mn but my replication kill
>> everything.
>> My segments are merged too often sor full index replicate and cache lost
>> and
>> .... I've no idea what can I do now?
>> Some help would be brilliant,
>> btw im using Solr 1.4.
>>
> 
> sunnnyfr , whether the replication is full or delta , the caches are
> lost completely.
> 
> you can think of partitioning the index into separate Solrs and
> updating one partition at a time and perform distributed search.
> 
>> Thanks,
>>
>>
>> Otis Gospodnetic wrote:
>>>
>>> Mike is right about the occasional slow-down, which appears as a pause
>>> and
>>> is due to large Lucene index segment merging.  This should go away with
>>> newer versions of Lucene where this is happening in the background.
>>>
>>> That said, we just indexed about 20MM documents on a single 8-core
>>> machine
>>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process
>>> took
>>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>>> approach before some of our changes apparently required several days to
>>> index the same amount of data.
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>> ----- Original Message ----
>>> From: Mike Klaas <mike.kl...@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, November 19, 2007 5:50:19 PM
>>> Subject: Re: Any tips for indexing large amounts of data?
>>>
>>> There should be some slowdown in larger indices as occasionally large
>>> segment merge operations must occur.  However, this shouldn't really
>>> affect overall speed too much.
>>>
>>> You haven't really given us enough data to tell you anything useful.
>>> I would recommend trying to do the indexing via a webapp to eliminate
>>> all your code as a possible factor.  Then, look for signs to what is
>>> happening when indexing slows.  For instance, is Solr high in cpu, is
>>> the computer thrashing, etc?
>>>
>>> -Mike
>>>
>>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for answering this question a while back. I have made some
>>>> of the suggestions you mentioned. ie not committing until I've
>>>> finished indexing. What I am seeing though, is as the index get
>>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>>> slows down to a crawl. Have you got any pointers as to what I might
>>>> be doing wrong?
>>>>
>>>> Also, I was looking at using MultiCore solr. Could this help in
>>>> some way?
>>>>
>>>> Thank you
>>>> Brendan
>>>>
>>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>>
>>>>>
>>>>> : I would think you would see better performance by allowing auto
>>>>> commit
>>>>> : to handle the commit size instead of reopening the connection
>>>>> all the
>>>>> : time.
>>>>>
>>>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>>>  just
>>>>> index everything, and don't commit until you are completely done.
>>>>>
>>>>> autoCommitting will slow your indexing down (the benefit being
>>>>> that more
>>>>> results will be visible to searchers as you proceed)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -Hoss
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22986152.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Any tips for indexing large amounts of data?

Reply via email to