Brendan - yes, 64-bit Linux this is, and the JVM got 5.5 GB heap, though it could have worked with less.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Brendan Grainger <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, November 21, 2007 1:24:05 PM Subject: Re: Any tips for indexing large amounts of data? Hi Otis, Thanks for this. Are you using a flavor of linux and is it 64bit? How much heap are you giving your jvm? Thanks again Brendan On Nov 21, 2007, at 2:03 AM, Otis Gospodnetic wrote: > Mike is right about the occasional slow-down, which appears as a > pause and is due to large Lucene index segment merging. This > should go away with newer versions of Lucene where this is > happening in the background. > > That said, we just indexed about 20MM documents on a single 8-core > machine with 8 GB of RAM, resulting in nearly 20 GB index. The > whole process took a little less than 10 hours - that's over 550 > docs/second. The vanilla approach before some of our changes > apparently required several days to index the same amount of data. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Mike Klaas <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, November 19, 2007 5:50:19 PM > Subject: Re: Any tips for indexing large amounts of data? > > There should be some slowdown in larger indices as occasionally large > segment merge operations must occur. However, this shouldn't really > affect overall speed too much. > > You haven't really given us enough data to tell you anything useful. > I would recommend trying to do the indexing via a webapp to eliminate > all your code as a possible factor. Then, look for signs to what is > happening when indexing slows. For instance, is Solr high in cpu, is > the computer thrashing, etc? > > -Mike > > On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: > >> Hi, >> >> Thanks for answering this question a while back. I have made some >> of the suggestions you mentioned. ie not committing until I've >> finished indexing. What I am seeing though, is as the index get >> larger (around 1Gb), indexing is taking a lot longer. In fact it >> slows down to a crawl. Have you got any pointers as to what I might >> be doing wrong? >> >> Also, I was looking at using MultiCore solr. Could this help in >> some way? >> >> Thank you >> Brendan >> >> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: >> >>> >>> : I would think you would see better performance by allowing auto >>> commit >>> : to handle the commit size instead of reopening the connection >>> all the >>> : time. >>> >>> if your goal is "fast" indexing, don't use autoCommit at all ... > just >>> index everything, and don't commit until you are completely done. >>> >>> autoCommitting will slow your indexing down (the benefit being >>> that more >>> results will be visible to searchers as you proceed) >>> >>> >>> >>> >>> -Hoss >>> >> > > > >