On 6/20/2011 3:18 PM, Michael McCandless wrote:
With segmentsPerTier at 35 you will easily cross 70 segs in the index...
If you want optimize to run in a single merge, I would lower
sementsPerTier and mergeAtOnce (maybe back to the 10 default), and set
your maxMergeAtOnceExplicit to 70 or higher...
Lower mergeAtOnce means merges run more frequently but for shorter
time, and, your searching should be faster (than 35/35) since there
are fewer segments to visit.
Thanks again for more detailed information. There is method to my
madness, which I will now try to explain.
With a value of 10, the reindex involves enough merges that there is are
many second level merges, and a third-level merge. I was running into
situations on my development platform (with its slow disks) where there
were three merges happening at the same time, which caused all indexing
activity to cease for several minutes. This in turn would cause JDBC to
time out and drop the connection to the database, which caused DIH to
fail and rollback the entire import about two hours (two thirds) in.
With a mergeFactor of 35, there are no second level merges, and no
third-level merges. I can do a complete reindex successfully even on a
system with slow disks.
In production, one shard (out of six) is optimized every day to
eliminate deleted documents. When I have to reindex everything, I will
typically go through and manually optimize each shard in turn after it's
done. This is the point where I discovered this two-pass problem.
I don't want to do a full-import with optimize=true, because all six
large shards build at the same time in a Xen environment. The I/O storm
that results from three optimizes happening on each host at the same
time and then replicating to similar Xen hosts is very bad.
I have now set maxMergeAtOnceExplicit to 105. I think that is probably
enough, given that that I currently do not experience any second level
merges. When my index gets big enough, I will increase the ram buffer.
By then I will probably have more memory, so the first-level merges can
still happen entirely from I/O cache.
Shawn