Vikasht34 opened a new issue, #15820:
URL: https://github.com/apache/lucene/issues/15820

   ### Description
   
   Component: core/codecs
   
   Description:
   
   The lucene103 blocktree codec replaced the in-memory FST term index with
   an on-disk TrieReader. This causes a significant performance regression
   for workloads that perform high-frequency seekExact() calls on the _id
   field during document indexing.
   
   ## Environment
   
   - OpenSearch 3.3 (Lucene 10.x with lucene103 codec) vs OpenSearch 2.19 
(Lucene 9.12.0 with lucene90 codec)
   - JDK: Amazon Corretto 21.0.8
   - Workload: 32 KNN indices, 6 shards each, mixed ingest+query (50/50), 
     bulk indexing with explicit _id (UUID), ~400 segments per index at 
     refresh_interval=1s
   
   ## Problem
   
   Every indexed document with an explicit _id triggers 
   PerThreadIDVersionAndSeqNoLookup.getDocID() which calls 
   SegmentTermsEnum.seekExact(BytesRef) on every segment to check for 
   version conflicts. With ~400 segments per index, each document requires 
   ~400 seekExact calls.
   
   In lucene90, seekExact navigates an in-memory FST (heap-resident). 
   In lucene103, seekExact navigates a TrieReader via memory-mapped file 
   reads, where each read triggers MemorySessionImpl.checkValidStateRaw() 
   (Panama Foreign Memory API bounds check).
   
   ## JFR Evidence
   
   Write thread profiling (JFR ExecutionSample) shows:
   
   lucene103 (3.3): 10.0% of write thread time in seekExact path
     DataInput.readVLong()
       SegmentTermsEnumFrame.loadBlock()
         SegmentTermsEnum.lambda$prepareSeekExact$1(BytesRef)
           SegmentTermsEnum.seekExact(BytesRef)
             PerThreadIDVersionAndSeqNoLookup.getDocID()
   
   lucene90 (2.19): 2.6% of write thread time in seekExact path
     FST$Arc$BitTable.isBitSet()
       FST.findTargetArc()
         SegmentTermsEnum.seekExact(BytesRef)
           PerThreadIDVersionAndSeqNoLookup.getDocID()
   
   Additionally, 6.6% of write thread time is spent in 
   MemorySessionImpl.checkValidStateRaw() on memory-mapped reads triggered 
   by the TrieReader navigation.
   
   Combined: 16.6% write thread overhead vs 2.6% = 6.4x regression for 
   this code path.
   
   ## Impact
   
   At 256,000 seekExact calls/sec (32 TPS × 20 docs/bulk × 400 segments), 
   this overhead causes:
   - 1.9x per-document indexing latency (577µs vs 303µs)
   - Search thread saturation under mixed workload (queries slow down due 
     to CPU contention)
   - Ingestion stalls at 297k docs/tenant vs 600k+ on lucene90
   
   Increasing refresh_interval from 1s to 30s (reducing segments from ~400 
   to ~13) mitigates the issue by reducing seekExact calls 30x, pushing 
   the stall point from 297k to 497k.
   
   ## Root Cause
   
   Two compounding factors:
   
   1. TrieReader replaces in-memory FST with on-disk trie navigation. 
      The FST was loaded into Java heap at segment open time — navigation 
      was pure CPU (BitTable.isBitSet). The TrieReader reads from 
      memory-mapped files, adding I/O indirection.
   
   2. Each memory-mapped read triggers checkValidStateRaw() — the Panama 
      Foreign Memory API bounds check that verifies the Arena is still 
      open. This is called on every byte read from the mmap file.
   
   The _id field is special: it is looked up via seekExact on every single 
   document indexed. It has a random access pattern (UUIDs) that does not 
   benefit from the TrieReader's sequential access optimizations.
   
   
   ## How to Reproduce
   
   1. Create an index with many small segments (refresh_interval=1s, 
      continuous ingestion)
   2. Bulk index documents with explicit _id (UUIDs)
   3. Profile write threads with JFR
   4. Compare seekExact time between lucene90 and lucene103 codecs
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to