luyuncheng opened a new pull request, #12350:
URL: https://github.com/apache/lucene/pull/12350

   ### Problem statement
   we found when Lucene using in `frequently update` OR `update by query` 
scenarios. it will do many iteration in the following code:
   
https://github.com/apache/lucene/blob/0c293909c050e2f24ef5a6062d8da31e6595e8cd/lucene/core/src/java/org/apache/lucene/index/SoftDeletesRetentionMergePolicy.java#L166-L176
   
   Because `SoftDeletesRetentionMergePolicy` need query with 
`retentionQuerySupplier` AND then filter the retention documents. it is time 
consuming to iterator docid in frequently updates scenarios
   
   there is flame graph:
   
![](https://user-images.githubusercontent.com/30896830/241833749-d92303d3-1ebc-4a5d-8ae0-143bfb3d4660.png)
   
   #### we tracing the stack:
   
![20230606-183434](https://github.com/apache/lucene/assets/12760367/35f2b509-a67d-4b05-a09c-6d048d958372)
   
   it will be called from the stack in update documents:
   
https://github.com/apache/lucene/blob/0c293909c050e2f24ef5a6062d8da31e6595e8cd/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L5891-L5899
   
   and will be called from the stack in merge:
   
https://github.com/apache/lucene/blob/0c293909c050e2f24ef5a6062d8da31e6595e8cd/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2346-L2361
   
   ### Proposal
   there is some optimize to reduce the number of calling `numDeletesToMerge`:
   1. #12339 try to reduce in `getSortedBySegmentSize`
   when me do merge before we call: `getSortedBySegmentSize`, and it will 
duplicate calculate `numDeletesToMerge`
   
   2. this pr try to reduce in `findForcedDeletesMerges`
   when we try to find delete size, it will duplicate calculate 
`numDeletesToMerge`
   
   In our scenarios, `numDeletesToMerge` calling make the write latency strike 
increased,  because `updatePendingMerges` is a `synchronized` method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to