Hello Erick, thanks for your answer!

Yes, our over-optimization is mainly due to paranoia over these strange
commit times. The long optimize time persisted in all the subsequent
commits, and this is consistent with what we are seeing in other production
indexes that have the same problem. Once the anomaly shows up, it never
commits quickly again.

I combed through the last 50k documents that were added before the first
slow commit. I found one with a larger than usual number of fields (didn't
write down the number, but it was a few thousands).

I deleted it, and the following optimize was normal again (110 seconds). So
I'm pretty sure a document with lots of fields is the cause of the slowdown.

If that would be useful, I can do some further testing to confirm this
hypothesis and send the document to the list.

Thanks again for your answer.

Best,
Santiago

On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> First, there's absolutely no reason to optimize this often, if at all.
> Older
> versions of Lucene would search faster on an optimized index, but
> this is no longer necessary. Optimize will reclaim data from
> deleted documents, but is generally recommended to be performed
> fairly rarely, often at off-peak hours.
>
> Note that optimize will re-write your entire index into a single new
> segment,
> so following your pattern it'll take longer and longer each time.
>
> But the speed change happening at 500,000 documents is suspiciously
> close to the default mergeFactor of 10 X 50,000. Do subsequent
> optimizes (i.e. on the 750,000th document) still take that long? But
> this doesn't make sense because if you're optimizing instead of
> committing, each optimize should reduce your index to 1 segment and
> you'll never hit a merge.
>
> So I'm a little confused. If you're really optimizing every 50K docs, what
> I'd expect to see is successively longer times, and at the end of each
> optimize I'd expect there to be only one segment in your index.
>
> Are you sure you're not just seeing successively longer times on each
> optimize and just noticing it after 10?
>
> Best
> Erick
>
> On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque <sbazer...@gmail.com>
> wrote:
> > Hello!
> >
> > Here is a puzzling experiment:
> >
> > I build an index of about 1.2MM documents using SOLR 3.1. The index has a
> > large number of dynamic fields (about 15.000). Each document has about
> 100
> > fields.
> >
> > I add the documents in batches of 20, and every 50.000 documents I
> optimize
> > the index.
> >
> > The first 10 optimizes (up to exactly 500k documents) take less than a
> > minute and a half.
> >
> > But the 11th and all subsequent commits take north of 10 minutes. The
> commit
> > logs look identical (in the INFOSTREAM.txt file), but what used to be
> >
> >   Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene
> Merge
> > Thread #0]: merge: total 500000 docs
> >
> > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge
> > Thread #0]: merge store matchedCount=2 vs 2
> >
> >
> > now eats a lot of time:
> >
> >
> >   Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene
> Merge
> > Thread #0]: merge: total 550000 docs
> >
> > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge
> > Thread #0]: merge store matchedCount=2 vs 2
> >
> >
> > What could be happening between those two lines that takes 10 minutes at
> > full CPU? (and with 50k docs less used to take so much less?).
> >
> >
> > Thanks in advance,
> >
> > Santiago
> >
>

Reply via email to