On 11/4/2010 3:27 AM, Tommaso Teofili wrote:
    - Is mergeFactor a one time configuration setting that is considered only
    when creating the index for the first time or can it be adjusted later even
    with some docs inside the index? e.g. I have mF to 10 then I realize I want
    quicker searches and I set it to 2 so that at the next optimize/commit I
    will have no more than 2 segments. My understanding is that one can adjust
    mF over time, is it right?

The mergeFactor is applied anytime documents are added to the index, not just when it is built for the first time. You can adjust it later, and reload the core or restart Solr. It will apply to any additional indexing from that point forward.

With a mergeFactor of 10, having 21 segments (and more) temporarily on the disk at the same time is reasonably possible. I know this applies if you are doing a continuous large insert, not sure if you are doing several small inserts separately. These segments are:

* The small segment that is being built right now.
* The previous 10 small segments.
* The merged segment being created from those above.
* The previous 9 merged segments.

If it takes a really long time to merge the last 10 small segments and then merge the 10 large segments into an even larger segment, you can end up with even more small segments from your continuous insert. If it should take long enough that you actually get 10 more new small segments, the large merge will pause while it completes the small merge. I saw this happen recently when I decided to see what happens if I built a single shard from our entire database. It took a really long time, partly from that super-merge and the optimize that happened later, and took up 85GB of disk space.

I'm not really sure what happens if you have this continue beyond a single super-merge like I have mentioned.

    - In a replicated environment does it make sense to define different
    mergeFactors on master and slave? I'd say no since it influences the number
    of segments created, that being a concern of who actually index documents
    (the master) not of who receives (segments of) index, but please correct me
    if I am wrong.

Because it only applies when indexes are being built, it has no meaning on a slave, which as you said, just copies the data from the master.

Shawn

Reply via email to