Hi Tim - thanks for the answer. For your assumption: my documents are about 50kb each in the index, but after two weeks of updating and not removing i have about 40% percent of unused docs in my index and that has an impact on the query performance. 1) My incentive for optimizing and not merging was to take advantage of the "dead" hours of the engine, hours in local night that have low qps rate. Thus i would control the hours during which these operations occur and the merging and query threads wouldn't have to compete on the same resources - correct me if i'm mistaking.
2) Using expunge deletes attribute might be an interesting option as the segments that contain deleted docs should be only a few as they were created in the same time range (a month before), but in my case i have few deleted docs even in new segments for various reasons. If I use this suggested commit and all my segments contain deleted docs it would result into optimizing, wouldn't it? Is there an option of controlling expunge deletes to more than N deleted docs, so i would avoid the pseudo-optimize process. Manu On Sat, Mar 2, 2013 at 8:54 PM, Timothy Potter <thelabd...@gmail.com> wrote: > Hi Manuel, > > If you search "optimize" on this mailing list, you'll see that one of > the common suggestions is to avoid optimizing and fine-tune segment > merging instead. So to begin, take a look at your solrconfig.xml and > find out what your merge policy and mergeFactor are set to (note: they > may be commented out which implies segment merging is still enabled > with the default settings). You can experiment with changing the > mergeFactor. > > Based on your description of adding and removing a few thousand > documents each day, I'm going to assume your documents are very large > otherwise I can't see how you'd ever notice an impact on query > performance. Is my assumption about the document size correct? > > One thing you can try is to use the expungeDeletes attribute set to > "true" when you commit, ie. <commit expungeDeletes="true"/>. This > triggers Solr to merge any segments with deletes. > > Lastly, I'm not sure about your specific questions related to > optimizations, but I think it's worth trying the suggestions above and > avoid optimizations altogether. I'm pretty sure the answer to #1 is no > and for #2 is it optimizes independently. > > Cheers, > Tim > > > On Sat, Mar 2, 2013 at 10:24 AM, Manuel Le Normand > <manuel.lenorm...@gmail.com> wrote: > > My use-case is a casi-monthly changing index. Everyday i index few > > thousands of docs and erase a similar number of older documents, whilst > few > > documents last in the index for ever (about 20 % of my index). After few > > experiments, i get that leaving the older documents in the index (mostly > in > > the *.tim file) slows down significally my avg qTime and got to the > > conclusion i need to optimize the index once every few days to get ride > of > > the older documents. > > > > Optimization requires about 2 times more the index storage. As i have > many > > shards and one replica for each, and the optimization occurs > simultaneously > > for all, i need twice the amount of storage of my initial index size, > while > > half of it is used very unfrequently (optimization takes about an hour). > > > > 1) Is there a possibility of using a storage pool for all shards, so > every > > shard uses the spare storage in series, forcing the optimization to run > > unsimultaneously. In this case all the storage i'd use would be (total > > index storage + shard storage) instead of twice the total index storage. > > > > 2) When i run optimization for a replicated core, does it copy from its > > leader or does it optimize independenly? > > > > Thanks, > > Manu >