Thats great. At what size of the index do you think we should look at partitioning the index file?
Eswar On Nov 21, 2007 12:57 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Just tried a search for "web" on this index - 1.1 seconds. This matches > about 1MM of about 20MM docs. Redo the search, and it's 1 ms (cached). > This is without any load nor serious benchmarking, clearly. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Eswar K <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, November 21, 2007 2:11:07 AM > Subject: Re: Any tips for indexing large amounts of data? > > Hi otis, > > I understand that is slightly off track question, but I am just curious > to > know the performance of Search on a 20 GB index file. What has been > your > observation? > > Regards, > Eswar > > On Nov 21, 2007 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]> > wrote: > > > Mike is right about the occasional slow-down, which appears as a > pause and > > is due to large Lucene index segment merging. This should go away > with > > newer versions of Lucene where this is happening in the background. > > > > That said, we just indexed about 20MM documents on a single 8-core > machine > > with 8 GB of RAM, resulting in nearly 20 GB index. The whole process > took a > > little less than 10 hours - that's over 550 docs/second. The vanilla > > approach before some of our changes apparently required several days > to > > index the same amount of data. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: Mike Klaas <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Monday, November 19, 2007 5:50:19 PM > > Subject: Re: Any tips for indexing large amounts of data? > > > > There should be some slowdown in larger indices as occasionally large > > segment merge operations must occur. However, this shouldn't really > > affect overall speed too much. > > > > You haven't really given us enough data to tell you anything useful. > > I would recommend trying to do the indexing via a webapp to eliminate > > all your code as a possible factor. Then, look for signs to what is > > happening when indexing slows. For instance, is Solr high in cpu, is > > the computer thrashing, etc? > > > > -Mike > > > > On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: > > > > > Hi, > > > > > > Thanks for answering this question a while back. I have made some > > > of the suggestions you mentioned. ie not committing until I've > > > finished indexing. What I am seeing though, is as the index get > > > larger (around 1Gb), indexing is taking a lot longer. In fact it > > > slows down to a crawl. Have you got any pointers as to what I might > > > be doing wrong? > > > > > > Also, I was looking at using MultiCore solr. Could this help in > > > some way? > > > > > > Thank you > > > Brendan > > > > > > On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: > > > > > >> > > >> : I would think you would see better performance by allowing auto > > >> commit > > >> : to handle the commit size instead of reopening the connection > > >> all the > > >> : time. > > >> > > >> if your goal is "fast" indexing, don't use autoCommit at all ... > > just > > >> index everything, and don't commit until you are completely done. > > >> > > >> autoCommitting will slow your indexing down (the benefit being > > >> that more > > >> results will be visible to searchers as you proceed) > > >> > > >> > > >> > > >> > > >> -Hoss > > >> > > > > > > > > > > > > > > > > >