On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr <johanna...@gmail.com> wrote: > > Hi Otis, > How did you manage that? I've 8 core machine with 8GB of ram and 11GB index > for 14M docs and 50000 update every 30mn but my replication kill everything. > My segments are merged too often sor full index replicate and cache lost and > .... I've no idea what can I do now? > Some help would be brilliant, > btw im using Solr 1.4. >
sunnnyfr , whether the replication is full or delta , the caches are lost completely. you can think of partitioning the index into separate Solrs and updating one partition at a time and perform distributed search. > Thanks, > > > Otis Gospodnetic wrote: >> >> Mike is right about the occasional slow-down, which appears as a pause and >> is due to large Lucene index segment merging. This should go away with >> newer versions of Lucene where this is happening in the background. >> >> That said, we just indexed about 20MM documents on a single 8-core machine >> with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took >> a little less than 10 hours - that's over 550 docs/second. The vanilla >> approach before some of our changes apparently required several days to >> index the same amount of data. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> ----- Original Message ---- >> From: Mike Klaas <mike.kl...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Monday, November 19, 2007 5:50:19 PM >> Subject: Re: Any tips for indexing large amounts of data? >> >> There should be some slowdown in larger indices as occasionally large >> segment merge operations must occur. However, this shouldn't really >> affect overall speed too much. >> >> You haven't really given us enough data to tell you anything useful. >> I would recommend trying to do the indexing via a webapp to eliminate >> all your code as a possible factor. Then, look for signs to what is >> happening when indexing slows. For instance, is Solr high in cpu, is >> the computer thrashing, etc? >> >> -Mike >> >> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: >> >>> Hi, >>> >>> Thanks for answering this question a while back. I have made some >>> of the suggestions you mentioned. ie not committing until I've >>> finished indexing. What I am seeing though, is as the index get >>> larger (around 1Gb), indexing is taking a lot longer. In fact it >>> slows down to a crawl. Have you got any pointers as to what I might >>> be doing wrong? >>> >>> Also, I was looking at using MultiCore solr. Could this help in >>> some way? >>> >>> Thank you >>> Brendan >>> >>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: >>> >>>> >>>> : I would think you would see better performance by allowing auto >>>> commit >>>> : to handle the commit size instead of reopening the connection >>>> all the >>>> : time. >>>> >>>> if your goal is "fast" indexing, don't use autoCommit at all ... >> just >>>> index everything, and don't commit until you are completely done. >>>> >>>> autoCommitting will slow your indexing down (the benefit being >>>> that more >>>> results will be visible to searchers as you proceed) >>>> >>>> >>>> >>>> >>>> -Hoss >>>> >>> >> >> >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul