The docs on reclaimDeletesWeight say:

"Controls how aggressively merges that reclaim more deletions are favored. Higher values favor selecting merges that reclaim deletions."

I can't imagine you would notice anything after only a few commits. I have many shards that size or larger and what I do occasionally is to loop an optimize, setting maxSegments with decremented values, e.g.,

for maxSegments in $( seq 40 -1 20 ); do
  # optimize maxSegments=$maxSegments
done

It's definitely a poor-man's hack and is clearly not the most efficient way of optimizing, but it does remove deletes without requiring double or triple the disk space that a full optimize requires. I can usually reclaim 100-300GB of disk space in a collection that us currently ~ 2TB -- not inconsequential.

Seeing you only have 1.6M documents, perhaps an index rebuild isn't out of the question? I did just that on a test collection with 100M documents. Starting with 0 deleted docs, a reclaimDeletesWeight=5.0 and probably about 1-3% document turnover per week (updates) over the last 3 months and my deleted percentage is staying below 10%.

If that's not an option, keeping reclaimDeletesWeight at 5.0 and using expungeDeletes=true on commit will get that percentage down over time.

//


On 04/01/2016 04:49 AM, Jostein Elvaker Haande wrote:
On 30 March 2016 at 17:46, Erick Erickson <erickerick...@gmail.com> wrote:
through a clever bit of reflection, you can set the
reclaimDeletesWeight variable from solrconfig by including something
like
<double name="reclaimDeletesWeight">5</double> (going from memory
here, you'll get an error on startup if I've messed it up.....)

I added the following to my solrconfig a couple of days ago:

     <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
       <int name="maxMergeAtOnce">8</int>
       <int name="segmentsPerTier">8</int>
       <double name="reclaimDeletesWeight">5.0</double>
     </mergePolicy>

There has been several commits and the core is current according to
SOLR admin, however I'm still seeing a lot of deleted docs. These are
my current core statistics.

Last Modified:4 minutes ago
Num Docs:1 675 255
Max Doc:2 353 476
Heap Memory Usage:208 464 267
Deleted Docs:678 221
Version:1 870 539
Segment Count:39

Index size is close to 149GB.

So at the moment, I'm seeing a deleted docs to max docs percentage
ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
deleting away any deleted docs.

Anything obvious I'm missing?

Reply via email to