Thanks for your answers Erick & Mohammad! I'll get back to the list if I have more specific info about this issue, so far the index is performing normally again.
Best, Santiago On Mon, Jun 20, 2011 at 9:29 AM, Erick Erickson <erickerick...@gmail.com>wrote: > Hmmm, that is odd, anyone else want to chime in here? > > But optimizing isn't going to help with the strange commit > times, it'll only make it worse. It's not doing you much if > any good, so I'd think about not optimizing.... > > About the commit times in general. Depending upon when the > merge happens, lots of work can go on under the covers. > > Here's a detailed look at merging... > http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ > > But the short form is that, depending upon the number of > segments and the merge policy, you may periodically hit > a commit that copies, perhaps, #all# the current segments > into a single segment, which will create a large pause. > > But it's always possible that something's wonky with documents > that have a very large number of fields. > > There's some interesting work being done on trunk to flatten > out this curve, but that's not going to do you much good > in the 3.x code line... > > Best > Erick > > On Sun, Jun 19, 2011 at 10:32 AM, Santiago Bazerque <sbazer...@gmail.com> > wrote: > > Hello Erick, thanks for your answer! > > > > Yes, our over-optimization is mainly due to paranoia over these strange > > commit times. The long optimize time persisted in all the subsequent > > commits, and this is consistent with what we are seeing in other > production > > indexes that have the same problem. Once the anomaly shows up, it never > > commits quickly again. > > > > I combed through the last 50k documents that were added before the first > > slow commit. I found one with a larger than usual number of fields > (didn't > > write down the number, but it was a few thousands). > > > > I deleted it, and the following optimize was normal again (110 seconds). > So > > I'm pretty sure a document with lots of fields is the cause of the > slowdown. > > > > If that would be useful, I can do some further testing to confirm this > > hypothesis and send the document to the list. > > > > Thanks again for your answer. > > > > Best, > > Santiago > > > > On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson < > erickerick...@gmail.com>wrote: > > > >> First, there's absolutely no reason to optimize this often, if at all. > >> Older > >> versions of Lucene would search faster on an optimized index, but > >> this is no longer necessary. Optimize will reclaim data from > >> deleted documents, but is generally recommended to be performed > >> fairly rarely, often at off-peak hours. > >> > >> Note that optimize will re-write your entire index into a single new > >> segment, > >> so following your pattern it'll take longer and longer each time. > >> > >> But the speed change happening at 500,000 documents is suspiciously > >> close to the default mergeFactor of 10 X 50,000. Do subsequent > >> optimizes (i.e. on the 750,000th document) still take that long? But > >> this doesn't make sense because if you're optimizing instead of > >> committing, each optimize should reduce your index to 1 segment and > >> you'll never hit a merge. > >> > >> So I'm a little confused. If you're really optimizing every 50K docs, > what > >> I'd expect to see is successively longer times, and at the end of each > >> optimize I'd expect there to be only one segment in your index. > >> > >> Are you sure you're not just seeing successively longer times on each > >> optimize and just noticing it after 10? > >> > >> Best > >> Erick > >> > >> On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque <sbazer...@gmail.com > > > >> wrote: > >> > Hello! > >> > > >> > Here is a puzzling experiment: > >> > > >> > I build an index of about 1.2MM documents using SOLR 3.1. The index > has a > >> > large number of dynamic fields (about 15.000). Each document has about > >> 100 > >> > fields. > >> > > >> > I add the documents in batches of 20, and every 50.000 documents I > >> optimize > >> > the index. > >> > > >> > The first 10 optimizes (up to exactly 500k documents) take less than a > >> > minute and a half. > >> > > >> > But the 11th and all subsequent commits take north of 10 minutes. The > >> commit > >> > logs look identical (in the INFOSTREAM.txt file), but what used to be > >> > > >> > Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene > >> Merge > >> > Thread #0]: merge: total 500000 docs > >> > > >> > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene > Merge > >> > Thread #0]: merge store matchedCount=2 vs 2 > >> > > >> > > >> > now eats a lot of time: > >> > > >> > > >> > Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene > >> Merge > >> > Thread #0]: merge: total 550000 docs > >> > > >> > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene > Merge > >> > Thread #0]: merge store matchedCount=2 vs 2 > >> > > >> > > >> > What could be happening between those two lines that takes 10 minutes > at > >> > full CPU? (and with 50k docs less used to take so much less?). > >> > > >> > > >> > Thanks in advance, > >> > > >> > Santiago > >> > > >> > > >