On 6/20/2011 3:18 PM, Michael McCandless wrote:
With segmentsPerTier at 35 you will easily cross 70 segs in the index...
If you want optimize to run in a single merge, I would lower
sementsPerTier and mergeAtOnce (maybe back to the 10 default), and set
your maxMergeAtOnceExplicit to 70 or higher...

Lower mergeAtOnce means merges run more frequently but for shorter
time, and, your searching should be faster (than 35/35) since there
are fewer segments to visit.

Thanks again for more detailed information. There is method to my madness, which I will now try to explain.

With a value of 10, the reindex involves enough merges that there is are many second level merges, and a third-level merge. I was running into situations on my development platform (with its slow disks) where there were three merges happening at the same time, which caused all indexing activity to cease for several minutes. This in turn would cause JDBC to time out and drop the connection to the database, which caused DIH to fail and rollback the entire import about two hours (two thirds) in.

With a mergeFactor of 35, there are no second level merges, and no third-level merges. I can do a complete reindex successfully even on a system with slow disks.

In production, one shard (out of six) is optimized every day to eliminate deleted documents. When I have to reindex everything, I will typically go through and manually optimize each shard in turn after it's done. This is the point where I discovered this two-pass problem.

I don't want to do a full-import with optimize=true, because all six large shards build at the same time in a Xen environment. The I/O storm that results from three optimizes happening on each host at the same time and then replicating to similar Xen hosts is very bad.

I have now set maxMergeAtOnceExplicit to 105. I think that is probably enough, given that that I currently do not experience any second level merges. When my index gets big enough, I will increase the ram buffer. By then I will probably have more memory, so the first-level merges can still happen entirely from I/O cache.

Shawn

Reply via email to