I also have the solr with around 100mn docs.
I do optimize once in a week, and it takes around 1 hour 30 mins to
optimize.


On 19 June 2011 20:02, Santiago Bazerque <sbazer...@gmail.com> wrote:

> Hello Erick, thanks for your answer!
>
> Yes, our over-optimization is mainly due to paranoia over these strange
> commit times. The long optimize time persisted in all the subsequent
> commits, and this is consistent with what we are seeing in other production
> indexes that have the same problem. Once the anomaly shows up, it never
> commits quickly again.
>
> I combed through the last 50k documents that were added before the first
> slow commit. I found one with a larger than usual number of fields (didn't
> write down the number, but it was a few thousands).
>
> I deleted it, and the following optimize was normal again (110 seconds). So
> I'm pretty sure a document with lots of fields is the cause of the
> slowdown.
>
> If that would be useful, I can do some further testing to confirm this
> hypothesis and send the document to the list.
>
> Thanks again for your answer.
>
> Best,
> Santiago
>
> On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > First, there's absolutely no reason to optimize this often, if at all.
> > Older
> > versions of Lucene would search faster on an optimized index, but
> > this is no longer necessary. Optimize will reclaim data from
> > deleted documents, but is generally recommended to be performed
> > fairly rarely, often at off-peak hours.
> >
> > Note that optimize will re-write your entire index into a single new
> > segment,
> > so following your pattern it'll take longer and longer each time.
> >
> > But the speed change happening at 500,000 documents is suspiciously
> > close to the default mergeFactor of 10 X 50,000. Do subsequent
> > optimizes (i.e. on the 750,000th document) still take that long? But
> > this doesn't make sense because if you're optimizing instead of
> > committing, each optimize should reduce your index to 1 segment and
> > you'll never hit a merge.
> >
> > So I'm a little confused. If you're really optimizing every 50K docs,
> what
> > I'd expect to see is successively longer times, and at the end of each
> > optimize I'd expect there to be only one segment in your index.
> >
> > Are you sure you're not just seeing successively longer times on each
> > optimize and just noticing it after 10?
> >
> > Best
> > Erick
> >
> > On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque <sbazer...@gmail.com>
> > wrote:
> > > Hello!
> > >
> > > Here is a puzzling experiment:
> > >
> > > I build an index of about 1.2MM documents using SOLR 3.1. The index has
> a
> > > large number of dynamic fields (about 15.000). Each document has about
> > 100
> > > fields.
> > >
> > > I add the documents in batches of 20, and every 50.000 documents I
> > optimize
> > > the index.
> > >
> > > The first 10 optimizes (up to exactly 500k documents) take less than a
> > > minute and a half.
> > >
> > > But the 11th and all subsequent commits take north of 10 minutes. The
> > commit
> > > logs look identical (in the INFOSTREAM.txt file), but what used to be
> > >
> > >   Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene
> > Merge
> > > Thread #0]: merge: total 500000 docs
> > >
> > > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene
> Merge
> > > Thread #0]: merge store matchedCount=2 vs 2
> > >
> > >
> > > now eats a lot of time:
> > >
> > >
> > >   Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene
> > Merge
> > > Thread #0]: merge: total 550000 docs
> > >
> > > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene
> Merge
> > > Thread #0]: merge store matchedCount=2 vs 2
> > >
> > >
> > > What could be happening between those two lines that takes 10 minutes
> at
> > > full CPU? (and with 50k docs less used to take so much less?).
> > >
> > >
> > > Thanks in advance,
> > >
> > > Santiago
> > >
> >
>



-- 
Thanks and Regards
Mohammad Shariq

Reply via email to