[I] [Bug] Stored fields force merge regression between Lucene 9.12 and Lucene 10.0 [lucene]

via GitHub Thu, 10 Apr 2025 09:02:44 -0700


bharath-techie opened a new issue, #14463:
URL: https://github.com/apache/lucene/issues/14463


   ### Description
   
   We have observed force merge of stored fields opened with `MMAPDirectory` / 
`MemorySegmentIndexInput` regressing between 50% and 100% in Lucene 10.0 
compared to Lucene 9.12. 
   
   After debugging quite a bit, I realized changing the `DEFAULT_READADVICE` 
back to `SEQUENTIAL` fixed the issue. [ 
[Reference](https://github.com/opensearch-project/OpenSearch/issues/17722#issuecomment-2775807317)
 ]
   
   I have made the draft changes  - 
https://github.com/apache/lucene/compare/main...bharath-techie:lucene:merge-fix 
- similar to the ones made in https://github.com/apache/lucene/pull/13985 and 
verified that it fixes the regression.
   
   But raising this issue to also figure out why this fix is not needed for 
9.12 since the code has not changed in `Lucene90CompressingStoredFieldsReader`.
   
   1. Lucene 10 has FDT input initialized with random read advice 
[reference](https://github.com/apache/lucene/blob/releases/lucene/10.0.0/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L143)
   
   2. Lucene 9.12 too has FDT input initialized with context with RandomAccess 
[[reference](https://github.com/apache/lucene/blob/releases/lucene/9.12.0/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L131)]
   
   3. During merge flow , fieldsStream gets cloned 
[here](https://github.com/apache/lucene/blob/releases/lucene/9.12.0/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L95)
 . I checked the code in 10.0 and the new object is still referring to the same 
memory segments of the previous reader which was initialized with random 
madvise.[singleSegmentImpl]. I also tried reverting the changes in clone flow 
to be same as 9.12 and still the regression was present.
   
   [Note : `compound` is true and `FDT` is within the compound file , changing 
`ReadAdvice` in compound format didn't fix the regression but still adding this 
point for reference]
   
   The only difference is the default context change between 9.12 and 10.0 
which is changed from sequential to random. But the input was initialized with 
random.
   
   Looking to hear from community on why the default context change affected 
Lucene 10.0 and if the draft works so that I can raise PR.
   
   cc : @jpountz @uschindler 
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] [Bug] Stored fields force merge regression between Lucene 9.12 and Lucene 10.0 [lucene]

Reply via email to