Re: Any tips for indexing large amounts of data?

Noble Paul നോബിള്‍ नोब्ळ् Thu, 09 Apr 2009 10:41:59 -0700

On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr <johanna...@gmail.com> wrote:
>
> Hi Otis,
> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
> for 14M docs and 50000 update every 30mn but my replication kill everything.
> My segments are merged too often sor full index replicate and cache lost and
> .... I've no idea what can I do now?
> Some help would be brilliant,
> btw im using Solr 1.4.
>


sunnnyfr , whether the replication is full or delta , the caches are
lost completely.

you can think of partitioning the index into separate Solrs and
updating one partition at a time and perform distributed search.

> Thanks,
>
>
> Otis Gospodnetic wrote:
>>
>> Mike is right about the occasional slow-down, which appears as a pause and
>> is due to large Lucene index segment merging.  This should go away with
>> newer versions of Lucene where this is happening in the background.
>>
>> That said, we just indexed about 20MM documents on a single 8-core machine
>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>> approach before some of our changes apparently required several days to
>> index the same amount of data.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> ----- Original Message ----
>> From: Mike Klaas <mike.kl...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Monday, November 19, 2007 5:50:19 PM
>> Subject: Re: Any tips for indexing large amounts of data?
>>
>> There should be some slowdown in larger indices as occasionally large
>> segment merge operations must occur.  However, this shouldn't really
>> affect overall speed too much.
>>
>> You haven't really given us enough data to tell you anything useful.
>> I would recommend trying to do the indexing via a webapp to eliminate
>> all your code as a possible factor.  Then, look for signs to what is
>> happening when indexing slows.  For instance, is Solr high in cpu, is
>> the computer thrashing, etc?
>>
>> -Mike
>>
>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>
>>> Hi,
>>>
>>> Thanks for answering this question a while back. I have made some
>>> of the suggestions you mentioned. ie not committing until I've
>>> finished indexing. What I am seeing though, is as the index get
>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>> slows down to a crawl. Have you got any pointers as to what I might
>>> be doing wrong?
>>>
>>> Also, I was looking at using MultiCore solr. Could this help in
>>> some way?
>>>
>>> Thank you
>>> Brendan
>>>
>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : I would think you would see better performance by allowing auto
>>>> commit
>>>> : to handle the commit size instead of reopening the connection
>>>> all the
>>>> : time.
>>>>
>>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>>  just
>>>> index everything, and don't commit until you are completely done.
>>>>
>>>> autoCommitting will slow your indexing down (the benefit being
>>>> that more
>>>> results will be visible to searchers as you proceed)
>>>>
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>
>>
>>
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Any tips for indexing large amounts of data?

Reply via email to