The docs on reclaimDeletesWeight say:
"Controls how aggressively merges that reclaim more deletions are
favored. Higher values favor selecting merges that reclaim deletions."
I can't imagine you would notice anything after only a few commits. I
have many shards that size or larger and what I do occasionally is to
loop an optimize, setting maxSegments with decremented values, e.g.,
for maxSegments in $( seq 40 -1 20 ); do
# optimize maxSegments=$maxSegments
done
It's definitely a poor-man's hack and is clearly not the most efficient
way of optimizing, but it does remove deletes without requiring double
or triple the disk space that a full optimize requires. I can usually
reclaim 100-300GB of disk space in a collection that us currently ~ 2TB
-- not inconsequential.
Seeing you only have 1.6M documents, perhaps an index rebuild isn't out
of the question? I did just that on a test collection with 100M
documents. Starting with 0 deleted docs, a reclaimDeletesWeight=5.0 and
probably about 1-3% document turnover per week (updates) over the last 3
months and my deleted percentage is staying below 10%.
If that's not an option, keeping reclaimDeletesWeight at 5.0 and using
expungeDeletes=true on commit will get that percentage down over time.
//
On 04/01/2016 04:49 AM, Jostein Elvaker Haande wrote:
On 30 March 2016 at 17:46, Erick Erickson <erickerick...@gmail.com> wrote:
through a clever bit of reflection, you can set the
reclaimDeletesWeight variable from solrconfig by including something
like
<double name="reclaimDeletesWeight">5</double> (going from memory
here, you'll get an error on startup if I've messed it up.....)
I added the following to my solrconfig a couple of days ago:
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">8</int>
<int name="segmentsPerTier">8</int>
<double name="reclaimDeletesWeight">5.0</double>
</mergePolicy>
There has been several commits and the core is current according to
SOLR admin, however I'm still seeing a lot of deleted docs. These are
my current core statistics.
Last Modified:4 minutes ago
Num Docs:1 675 255
Max Doc:2 353 476
Heap Memory Usage:208 464 267
Deleted Docs:678 221
Version:1 870 539
Segment Count:39
Index size is close to 149GB.
So at the moment, I'm seeing a deleted docs to max docs percentage
ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
deleting away any deleted docs.
Anything obvious I'm missing?