John,

It would be great if Lucene's benchmark were used so everyone
could execute the test in their own environment and verify. It's
not clear the settings or code used to generate the results so
it's difficult to draw any reliable conclusions.

The steep spike shows greater evidence for the IO cache being
cleared during large merges resulting in search performance
degradation. See:
http://www.lucidimagination.com/search/?q=madvise

Merging is IO intensive, less CPU intensive, if the
ConcurrentMergeScheduler is used, which defaults to 3 threads,
then the CPU could be maxed out. Using a single thread on
synchronous spinning magnetic media seems more logical. Queries
are usually the inverse, CPU intensive, not IO intensive when
the index is in the IO cache. After merging a large segment (or
during), queries would start hitting disk, and the results
clearly show that. The queries are suddenly more time consuming
as they seek on disk at a time when IO activity is at it's peak
from merging large segments. Using madvise would prevent usable
indexes from being swapped to disk during a merge, query
performance would continue unabated.

As we move to a sharded model of indexes, large merges will
naturally not occur. Shards will reach a specified size and new
documents will be sent to new shards.

-J

On Sun, Sep 20, 2009 at 11:12 PM, John Wang <john.w...@gmail.com> wrote:
> The current default Lucene MergePolicy does not handle frequent updates
> well.
>
> We have done some performance analysis with that and a custom merge policy:
>
> http://code.google.com/p/zoie/wiki/ZoieMergePolicy
>
> -John
>
> On Mon, Sep 21, 2009 at 1:08 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> I opened SOLR-1447 for this
>>
>> 2009/9/18 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>> > We can use a simple reflection based implementation to simplify
>> > reading too many parameters.
>> >
>> > What I wish to emphasize is that Solr should be agnostic of xml
>> > altogether. It should only be aware of specific Objects and
>> > interfaces. If users wish to plugin something else in some other way ,
>> > it should be fine
>> >
>> >
>> >  There is a huge learning involved in learning the current
>> > solrconfig.xml . Let us not make people throw away that .
>> >
>> > On Sat, Sep 19, 2009 at 1:59 AM, Jason Rutherglen
>> > <jason.rutherg...@gmail.com> wrote:
>> >> Over the weekend I may write a patch to allow simple reflection based
>> >> injection from within solrconfig.
>> >>
>> >> On Fri, Sep 18, 2009 at 8:10 AM, Yonik Seeley
>> >> <yo...@lucidimagination.com> wrote:
>> >>> On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar
>> >>> <shalinman...@gmail.com> wrote:
>> >>>>> I was wondering if there is a way I can modify calibrateSizeByDeletes
>> just
>> >>>>> by configuration ?
>> >>>>>
>> >>>>
>> >>>> Alas, no. The only option that I see for you is to sub-class
>> >>>> LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in the
>> >>>> constructor. However, please open a Jira issue and so we don't forget
>> about
>> >>>> it.
>> >>>
>> >>> It's the continuing stuff like this that makes me feel like we should
>> >>> be Spring (or equivalent) based someday... I'm just not sure how we're
>> >>> going to get there.
>> >>>
>> >>> -Yonik
>> >>> http://www.lucidimagination.com
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > -----------------------------------------------------
>> > Noble Paul | Principal Engineer| AOL | http://aol.com
>> >
>>
>

Reply via email to