cheng66551 opened a new pull request, #14163:
URL: https://github.com/apache/lucene/pull/14163

   In version 7.6.0 of ElasticSearch, I found through /_cat/segments that the 
docs.deleted count of many segments was continuously increasing, but over time, 
**these deleted documents were never automatically merged**. The segment 
information is as follows: 
   
   ```java
   segment generation docs.count docs.deleted    size size.memory committed 
searchable version compound
   _1bn4gh   80020817    2434329     85866860   8.9gb     6845726 true      
true       8.4.0   false
   _1bqg6j   80175979     258975     18754886   1.8gb     1708132 true      
true       8.4.0   false
   _1brsd1   80238421     340857     17805014   1.8gb     1807134 true      
true       8.4.0   false
   _1bt573   80301711     444912     17747931   1.8gb     1831663 true      
true       8.4.0   false
   _1buf8x   80361393     590820     18290815   1.9gb     1762322 true      
true       8.4.0   false
   _1btbuk   80310332    2666453     12507939   1.9gb     2543630 true      
true       8.4.0   false
   _1bzdsz   80592803     242465     17280902   1.8gb     1565934 true      
true       8.4.0   false
   _1c3msi   80791074     330315     17941295   1.8gb     1623871 true      
true       8.4.0   false
   _1c75vi   80955774     425781     17177269   1.8gb     1645538 true      
true       8.4.0   false
   _1c9xyl   81085485     542056     18550711   1.8gb     1692414 true      
true       8.4.0   false
   ......
   ``` 
   So I triggered a forced merge through _forcemerge?only_expunge_deletes=true, 
but it had no effect.A similar phenomenon is mentioned in Issue #13226 
   
    **I suspect that TieredMergePolicy did not select these segments, thus no 
merge was triggered.** 
   Therefore, I wrote this forceMergeBySegmentNames method, which can bypass 
the logic of TieredMergePolicy and perform merging based on the specified 
segment names. When verified in the production environment, it achieved very 
good results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to