[ disclaimer: this worked for me, ymmv ... ]

I just battled this. Turns out incrementally optimizing using the
maxSegments attribute was the most efficient solution for me. In
particular when you are actually running out of disk space. 

#!/bin/bash

# n-segments I started with
high=400
# n-segments I want to optimize down to
low=300

for i in $(seq $high -10 $low); do
  # your optimize call with maxSegments=$i
  sleep 2
done

I was able to shrink my +3TB index by about 300GB optimizing
from 400 segments down to 300 (10 at a time). It optimized out the .del
for those segments that had one and, the best part, because you are only
rewriting 10 segments per loop, disk space footprint stays tolerable ... 
At least compared to a commit @expungeDeletes=true or of course, an
optimize without @maxSegments which basically rewrites the entire index.

NOTE: it wreaks havoc on the system, so expect search slowdown and best
not to index while this is going on either.

David


On Sun, 2015-01-11 at 06:46 -0700, ig01 wrote:
> Hi,
> 
> It's not an option for us, all the documents in our index have same deletion
> probability.
> Is there any other solution to perform an optimization in order to reduce
> index size?
> 
> Thanks in advance.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Frequent-deletions-tp4176689p4178720.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Reply via email to