cgejian opened a new issue, #14254:
URL: https://github.com/apache/lucene/issues/14254

   **Background**: In version 7.6.0 of ES, an external client is continuously 
executing update_by_query on an index.
   
   **Phenomenon**: At this time, I found through /_cat/segments that the 
docs.count and docs.deleted of many existing segments in the index are 
constantly changing.
   
   For example, the segment information is as follows
   ```java
   segment generation docs.count docs.deleted     size size.memory committed 
searchable version compound
    _mqg         29464    2683624       802282    2.5gb     2119594 true      
true       8.4.0   false
    _oxd         32305    1250591       632447    1.3gb     1138511 true      
true       8.4.0   false
    _oo5         31973    1434271       472773    1.3gb     1158523 true      
true       8.4.0   false
    _v6k         40412    1209509       320023    1.1gb      891519 true      
true       8.4.0   false
    _1w5          2453     539240       284964  629.9mb      584760 true      
true       8.4.0   false
    _v0q         40202     982360       266487  929.5mb      733621 true      
true       8.4.0   false
    _20b          2603    1367294       225623    1.1gb     1019214 true      
true       8.4.0   false
    _bu7         15343    1144383       210547 1007.5mb      846393 true      
true       8.4.0   false
    _733          9183    1875493       166523    1.5gb     1323250 true      
true       8.4.0   false
   ```
   After a few seconds, the segment information is as follows
   
   ```
   segment generation docs.count docs.deleted     size size.memory committed 
searchable version compound
    _mqg         29464    2683135       802771    2.5gb     2119594 true      
true       8.4.0   false
    _oxd         32305    1250591       632447    1.3gb     1138511 true      
true       8.4.0   false
    _oo5         31973    1434271       472773    1.3gb     1158523 true      
true       8.4.0   false
    _v6k         40412    1208615       320917    1.1gb      891519 true      
true       8.4.0   false
    _1w5          2453     537834       286370  629.9mb      584760 true      
true       8.4.0   false
    _v0q         40202     973870       274977  929.5mb      733621 true      
true       8.4.0   false
    _20b          2603    1361957       230960    1.1gb     1019214 true      
true       8.4.0   false
    _bu7         15343    1144383       210547 1007.5mb      846393 true      
true       8.4.0   false
    _733          9183    1870996       171020    1.5gb     1323250 true      
true       8.4.0   false
   ```
   
   The docs.count and docs.deleted of segments such as _mqg, _v6k, _1w5, etc. 
have changed.
   
   **Question**:Based on the above phenomenon, **it indicates that the 
docs.deleted in the SegmentCommitInfo generated by flush may change**.
   The logic of numDeletesToMerge() in CachingMergeContext is that if the cache 
exists, it can directly obtain the number of deleted documents from the cache 
according to SegmentCommitInfo; if it does not exist, it will obtain the number 
of deleted documents and then put it into the cache.
   The code is as follows(#12339):
   ```
   /**
    * a wrapper of IndexWriter MergeContext. Try to cache the {@link
    * #numDeletesToMerge(SegmentCommitInfo)} result in merge phase, to avoid 
duplicate calculation
    */
   class CachingMergeContext implements MergePolicy.MergeContext {
     final MergePolicy.MergeContext mergeContext;
     final HashMap<SegmentCommitInfo, Integer> cachedNumDeletesToMerge = new 
HashMap<>();
   
     CachingMergeContext(MergePolicy.MergeContext mergeContext) {
       this.mergeContext = mergeContext;
     }
   
     @Override
     public final int numDeletesToMerge(SegmentCommitInfo info) throws 
IOException {
       Integer numDeletesToMerge = cachedNumDeletesToMerge.get(info);
       if (numDeletesToMerge != null) {
         return numDeletesToMerge;
       }
       numDeletesToMerge = mergeContext.numDeletesToMerge(info);
       cachedNumDeletesToMerge.put(info, numDeletesToMerge);
       return numDeletesToMerge;
     }
   
     @Override
     public final int numDeletedDocs(SegmentCommitInfo info) {
       return mergeContext.numDeletedDocs(info);
     }
   
     @Override
     public final InfoStream getInfoStream() {
       return mergeContext.getInfoStream();
     }
   
     @Override
     public final Set<SegmentCommitInfo> getMergingSegments() {
       return mergeContext.getMergingSegments();
     }
   }
   
   ```
   When docs.deleted is constantly changing, **is the number of deleted 
documents obtained from CachingMergeContext.numDeletesToMerge() possibly 
incorrect?**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to