I also have the solr with around 100mn docs. I do optimize once in a week, and it takes around 1 hour 30 mins to optimize.
On 19 June 2011 20:02, Santiago Bazerque <sbazer...@gmail.com> wrote: > Hello Erick, thanks for your answer! > > Yes, our over-optimization is mainly due to paranoia over these strange > commit times. The long optimize time persisted in all the subsequent > commits, and this is consistent with what we are seeing in other production > indexes that have the same problem. Once the anomaly shows up, it never > commits quickly again. > > I combed through the last 50k documents that were added before the first > slow commit. I found one with a larger than usual number of fields (didn't > write down the number, but it was a few thousands). > > I deleted it, and the following optimize was normal again (110 seconds). So > I'm pretty sure a document with lots of fields is the cause of the > slowdown. > > If that would be useful, I can do some further testing to confirm this > hypothesis and send the document to the list. > > Thanks again for your answer. > > Best, > Santiago > > On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > First, there's absolutely no reason to optimize this often, if at all. > > Older > > versions of Lucene would search faster on an optimized index, but > > this is no longer necessary. Optimize will reclaim data from > > deleted documents, but is generally recommended to be performed > > fairly rarely, often at off-peak hours. > > > > Note that optimize will re-write your entire index into a single new > > segment, > > so following your pattern it'll take longer and longer each time. > > > > But the speed change happening at 500,000 documents is suspiciously > > close to the default mergeFactor of 10 X 50,000. Do subsequent > > optimizes (i.e. on the 750,000th document) still take that long? But > > this doesn't make sense because if you're optimizing instead of > > committing, each optimize should reduce your index to 1 segment and > > you'll never hit a merge. > > > > So I'm a little confused. If you're really optimizing every 50K docs, > what > > I'd expect to see is successively longer times, and at the end of each > > optimize I'd expect there to be only one segment in your index. > > > > Are you sure you're not just seeing successively longer times on each > > optimize and just noticing it after 10? > > > > Best > > Erick > > > > On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque <sbazer...@gmail.com> > > wrote: > > > Hello! > > > > > > Here is a puzzling experiment: > > > > > > I build an index of about 1.2MM documents using SOLR 3.1. The index has > a > > > large number of dynamic fields (about 15.000). Each document has about > > 100 > > > fields. > > > > > > I add the documents in batches of 20, and every 50.000 documents I > > optimize > > > the index. > > > > > > The first 10 optimizes (up to exactly 500k documents) take less than a > > > minute and a half. > > > > > > But the 11th and all subsequent commits take north of 10 minutes. The > > commit > > > logs look identical (in the INFOSTREAM.txt file), but what used to be > > > > > > Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene > > Merge > > > Thread #0]: merge: total 500000 docs > > > > > > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene > Merge > > > Thread #0]: merge store matchedCount=2 vs 2 > > > > > > > > > now eats a lot of time: > > > > > > > > > Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene > > Merge > > > Thread #0]: merge: total 550000 docs > > > > > > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene > Merge > > > Thread #0]: merge store matchedCount=2 vs 2 > > > > > > > > > What could be happening between those two lines that takes 10 minutes > at > > > full CPU? (and with 50k docs less used to take so much less?). > > > > > > > > > Thanks in advance, > > > > > > Santiago > > > > > > -- Thanks and Regards Mohammad Shariq