cgejian opened a new issue, #14254: URL: https://github.com/apache/lucene/issues/14254
**Background**: In version 7.6.0 of ES, an external client is continuously executing update_by_query on an index. **Phenomenon**: At this time, I found through /_cat/segments that the docs.count and docs.deleted of many existing segments in the index are constantly changing. For example, the segment information is as follows ```java segment generation docs.count docs.deleted size size.memory committed searchable version compound _mqg 29464 2683624 802282 2.5gb 2119594 true true 8.4.0 false _oxd 32305 1250591 632447 1.3gb 1138511 true true 8.4.0 false _oo5 31973 1434271 472773 1.3gb 1158523 true true 8.4.0 false _v6k 40412 1209509 320023 1.1gb 891519 true true 8.4.0 false _1w5 2453 539240 284964 629.9mb 584760 true true 8.4.0 false _v0q 40202 982360 266487 929.5mb 733621 true true 8.4.0 false _20b 2603 1367294 225623 1.1gb 1019214 true true 8.4.0 false _bu7 15343 1144383 210547 1007.5mb 846393 true true 8.4.0 false _733 9183 1875493 166523 1.5gb 1323250 true true 8.4.0 false ``` After a few seconds, the segment information is as follows ``` segment generation docs.count docs.deleted size size.memory committed searchable version compound _mqg 29464 2683135 802771 2.5gb 2119594 true true 8.4.0 false _oxd 32305 1250591 632447 1.3gb 1138511 true true 8.4.0 false _oo5 31973 1434271 472773 1.3gb 1158523 true true 8.4.0 false _v6k 40412 1208615 320917 1.1gb 891519 true true 8.4.0 false _1w5 2453 537834 286370 629.9mb 584760 true true 8.4.0 false _v0q 40202 973870 274977 929.5mb 733621 true true 8.4.0 false _20b 2603 1361957 230960 1.1gb 1019214 true true 8.4.0 false _bu7 15343 1144383 210547 1007.5mb 846393 true true 8.4.0 false _733 9183 1870996 171020 1.5gb 1323250 true true 8.4.0 false ``` The docs.count and docs.deleted of segments such as _mqg, _v6k, _1w5, etc. have changed. **Question**:Based on the above phenomenon, **it indicates that the docs.deleted in the SegmentCommitInfo generated by flush may change**. The logic of numDeletesToMerge() in CachingMergeContext is that if the cache exists, it can directly obtain the number of deleted documents from the cache according to SegmentCommitInfo; if it does not exist, it will obtain the number of deleted documents and then put it into the cache. The code is as follows(#12339): ``` /** * a wrapper of IndexWriter MergeContext. Try to cache the {@link * #numDeletesToMerge(SegmentCommitInfo)} result in merge phase, to avoid duplicate calculation */ class CachingMergeContext implements MergePolicy.MergeContext { final MergePolicy.MergeContext mergeContext; final HashMap<SegmentCommitInfo, Integer> cachedNumDeletesToMerge = new HashMap<>(); CachingMergeContext(MergePolicy.MergeContext mergeContext) { this.mergeContext = mergeContext; } @Override public final int numDeletesToMerge(SegmentCommitInfo info) throws IOException { Integer numDeletesToMerge = cachedNumDeletesToMerge.get(info); if (numDeletesToMerge != null) { return numDeletesToMerge; } numDeletesToMerge = mergeContext.numDeletesToMerge(info); cachedNumDeletesToMerge.put(info, numDeletesToMerge); return numDeletesToMerge; } @Override public final int numDeletedDocs(SegmentCommitInfo info) { return mergeContext.numDeletedDocs(info); } @Override public final InfoStream getInfoStream() { return mergeContext.getInfoStream(); } @Override public final Set<SegmentCommitInfo> getMergingSegments() { return mergeContext.getMergingSegments(); } } ``` When docs.deleted is constantly changing, **is the number of deleted documents obtained from CachingMergeContext.numDeletesToMerge() possibly incorrect?** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org