On 11/4/2010 3:27 AM, Tommaso Teofili wrote:
- Is mergeFactor a one time configuration setting that is considered only
when creating the index for the first time or can it be adjusted later even
with some docs inside the index? e.g. I have mF to 10 then I realize I want
quicker searches and I set it to 2 so that at the next optimize/commit I
will have no more than 2 segments. My understanding is that one can adjust
mF over time, is it right?
The mergeFactor is applied anytime documents are added to the index, not
just when it is built for the first time. You can adjust it later, and
reload the core or restart Solr. It will apply to any additional
indexing from that point forward.
With a mergeFactor of 10, having 21 segments (and more) temporarily on
the disk at the same time is reasonably possible. I know this applies
if you are doing a continuous large insert, not sure if you are doing
several small inserts separately. These segments are:
* The small segment that is being built right now.
* The previous 10 small segments.
* The merged segment being created from those above.
* The previous 9 merged segments.
If it takes a really long time to merge the last 10 small segments and
then merge the 10 large segments into an even larger segment, you can
end up with even more small segments from your continuous insert. If it
should take long enough that you actually get 10 more new small
segments, the large merge will pause while it completes the small
merge. I saw this happen recently when I decided to see what happens if
I built a single shard from our entire database. It took a really long
time, partly from that super-merge and the optimize that happened later,
and took up 85GB of disk space.
I'm not really sure what happens if you have this continue beyond a
single super-merge like I have mentioned.
- In a replicated environment does it make sense to define different
mergeFactors on master and slave? I'd say no since it influences the number
of segments created, that being a concern of who actually index documents
(the master) not of who receives (segments of) index, but please correct me
if I am wrong.
Because it only applies when indexes are being built, it has no meaning
on a slave, which as you said, just copies the data from the master.
Shawn