easyice opened a new issue, #12983: URL: https://github.com/apache/lucene/issues/12983
### Description I'm using elasticsearch in the advertising analysis system, some index will have heavy updates operation. In general, we would disable softdelete because of its performance issues.However, this feature must be enabled in CCR scenarios. sometimes, the`SoftDeletesRetentionMergePolicy#numDeletesToMerge` method can take quite a long time to execute, that will block many write threads like this: <details> <summary >Mering thread</summary> ``` "elasticsearch[fdbd:xxx::30_9201][write][T#123]" #498 daemon prio=5 os_prio=0 cpu=62471737.46ms elapsed=5866392.54s tid=0x00007fb330149000 nid=0x10f555 runnable [0x00007fafe83c0000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.lucene80.IndexedDISI.advance(IndexedDISI.java:384) at org.apache.lucene.codecs.lucene80.IndexedDISI.nextDoc(IndexedDISI.java:459) at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$SparseNumericDocValues.nextDoc(Lucene80DocValuesProducer.java:496) at org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:245) at org.apache.lucene.index.SoftDeletesRetentionMergePolicy.numDeletesToMerge(SoftDeletesRetentionMergePolicy.java:145) at org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104) at org.elasticsearch.index.engine.CachedSoftDeletesCountMergePolicy.lambda$numDeletesToMerge$0(CachedSoftDeletesCountMergePolicy.java:87) at org.elasticsearch.index.engine.CachedSoftDeletesCountMergePolicy$$Lambda$4159/0x00007fb0e2a5a960.load(Unknown Source) at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) at org.elasticsearch.index.engine.CachedSoftDeletesCountMergePolicy.numDeletesToMerge(CachedSoftDeletesCountMergePolicy.java:87) at org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104) at org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104) at org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104) at org.apache.lucene.index.PendingDeletes.numDeletesToMerge(PendingDeletes.java:235) at org.apache.lucene.index.PendingSoftDeletes.numDeletesToMerge(PendingSoftDeletes.java:177) at org.apache.lucene.index.ReadersAndUpdates.numDeletesToMerge(ReadersAndUpdates.java:235) - locked <0x00007fc04eda8b20> (a org.apache.lucene.index.ReadersAndUpdates) at org.apache.lucene.index.IndexWriter.numDeletesToMerge(IndexWriter.java:5225) at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:559) at org.apache.lucene.index.TieredMergePolicy.getSortedBySegmentSize(TieredMergePolicy.java:294) at org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:323) at org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46) at org.apache.lucene.index.OneMergeWrappingMergePolicy.findMerges(OneMergeWrappingMergePolicy.java:47) at org.apache.lucene.index.OneMergeWrappingMergePolicy.findMerges(OneMergeWrappingMergePolicy.java:47) at org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46) at org.apache.lucene.index.OneMergeWrappingMergePolicy.findMerges(OneMergeWrappingMergePolicy.java:47) at org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46) at org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46) at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2194) - locked <0x00007fc04c208750> (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2157) ``` </details> <details> <summary >Other write thread</summary> ``` "elasticsearch[fdbd:xxx::30_9201][write][T#29]" #403 daemon prio=5 os_prio=0 cpu=62619000.88ms elapsed=5866397.19s tid=0x00007fb3ec115800 nid=0x10f35a waiting for monitor entry [0x00007fb0e16c4000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.lucene.index.IndexWriter.getNextMerge(IndexWriter.java:2225) - waiting to lock <0x00007fc04c208750> (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:529) - locked <0x00007fc04c1f88b8> (a org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2158) at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5136) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1597) at org.apache.lucene.index.IndexWriter.softUpdateDocument(IndexWriter.java:1654) ``` </details> I'm thinking about if possible to speed up `SoftDeletesRetentionMergePolicy#numDeletesToMerge`, Currently, we execute a search to collect docs that need to be retained with soft delete, then remove it from `liveDocs`. if most of the documents in the index have been soft deleted(user deleted or CCR follower lag), perhaps we can consider collecting the docs that don't need to be retained in soft delete(add a `reverseRetentionQuerySupplier`?), that's the number of deletes for a merge would claim. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org