On 10/25/2013 2:29 PM, michael.boom wrote:
As for why I am optimizing, well i do lots of delete by id and by query and
after a while about 30% of maxDocs are deletedDocs. On a 50G index that
means about 15G of space which I am trying to free by doing the
optimization.

"it's usually better NOT to optimize...."
Could you provide some more details on this?
Thank you!

Improvements in Lucene have made performance better on multi-segment indexes than it was in the past. There is still a small performance gain when optimizing multiple segments down to one, but it's not as much as it once was.

Optimizing is the only real way to shrink the index when there are large numbers of deleted documents, so in your case, doing an optimize is not a bad thing. It might be the kind of thing you manually trigger when you notice that there are a lot of deleted documents.

All of the arguments against optimization really boil down to one, and it's a really good one. Optimization rewrites your entire index. This means it has to read the whole thing, look at each document, and write non-deleted documents back out. This takes some of your CPU resources, but it's not usually a lot on modern hardware. The part that's really bad is that it generates a HUGE amount of I/O, and unless you have enough extra RAM to hold your index *twice*, will generally result in the OS disk cache being far less efficient while it's happening. This major I/O burden will generally make queries very slow while the optimize is happening.

Some might make the argument that optimizing requires a lot of disk space, but regular merges during indexing can result in the same behavior, so it's always recommended that you have enough space for 2-3 times your actual index size.

If optimizes happen really fast because your index is not very big, or you have a period of time during the day or night where your index is mostly idle, then it can make a lot of sense to do regular optimizes for performance reasons or to shrink the index when there are a lot of deletes.

Thanks,
Shawn

Reply via email to