On 10/25/2013 2:29 PM, michael.boom wrote:
As for why I am optimizing, well i do lots of delete by id and by query and
after a while about 30% of maxDocs are deletedDocs. On a 50G index that
means about 15G of space which I am trying to free by doing the
optimization.
"it's usually better NOT to optimize...."
Could you provide some more details on this?
Thank you!
Improvements in Lucene have made performance better on multi-segment
indexes than it was in the past. There is still a small performance
gain when optimizing multiple segments down to one, but it's not as much
as it once was.
Optimizing is the only real way to shrink the index when there are large
numbers of deleted documents, so in your case, doing an optimize is not
a bad thing. It might be the kind of thing you manually trigger when
you notice that there are a lot of deleted documents.
All of the arguments against optimization really boil down to one, and
it's a really good one. Optimization rewrites your entire index. This
means it has to read the whole thing, look at each document, and write
non-deleted documents back out. This takes some of your CPU resources,
but it's not usually a lot on modern hardware. The part that's really
bad is that it generates a HUGE amount of I/O, and unless you have
enough extra RAM to hold your index *twice*, will generally result in
the OS disk cache being far less efficient while it's happening. This
major I/O burden will generally make queries very slow while the
optimize is happening.
Some might make the argument that optimizing requires a lot of disk
space, but regular merges during indexing can result in the same
behavior, so it's always recommended that you have enough space for 2-3
times your actual index size.
If optimizes happen really fast because your index is not very big, or
you have a period of time during the day or night where your index is
mostly idle, then it can make a lot of sense to do regular optimizes for
performance reasons or to shrink the index when there are a lot of deletes.
Thanks,
Shawn