Vikasht34 opened a new issue, #15820:
URL: https://github.com/apache/lucene/issues/15820
### Description
Component: core/codecs
Description:
The lucene103 blocktree codec replaced the in-memory FST term index with
an on-disk TrieReader. This causes a significant performance regression
for workloads that perform high-frequency seekExact() calls on the _id
field during document indexing.
## Environment
- OpenSearch 3.3 (Lucene 10.x with lucene103 codec) vs OpenSearch 2.19
(Lucene 9.12.0 with lucene90 codec)
- JDK: Amazon Corretto 21.0.8
- Workload: 32 KNN indices, 6 shards each, mixed ingest+query (50/50),
bulk indexing with explicit _id (UUID), ~400 segments per index at
refresh_interval=1s
## Problem
Every indexed document with an explicit _id triggers
PerThreadIDVersionAndSeqNoLookup.getDocID() which calls
SegmentTermsEnum.seekExact(BytesRef) on every segment to check for
version conflicts. With ~400 segments per index, each document requires
~400 seekExact calls.
In lucene90, seekExact navigates an in-memory FST (heap-resident).
In lucene103, seekExact navigates a TrieReader via memory-mapped file
reads, where each read triggers MemorySessionImpl.checkValidStateRaw()
(Panama Foreign Memory API bounds check).
## JFR Evidence
Write thread profiling (JFR ExecutionSample) shows:
lucene103 (3.3): 10.0% of write thread time in seekExact path
DataInput.readVLong()
SegmentTermsEnumFrame.loadBlock()
SegmentTermsEnum.lambda$prepareSeekExact$1(BytesRef)
SegmentTermsEnum.seekExact(BytesRef)
PerThreadIDVersionAndSeqNoLookup.getDocID()
lucene90 (2.19): 2.6% of write thread time in seekExact path
FST$Arc$BitTable.isBitSet()
FST.findTargetArc()
SegmentTermsEnum.seekExact(BytesRef)
PerThreadIDVersionAndSeqNoLookup.getDocID()
Additionally, 6.6% of write thread time is spent in
MemorySessionImpl.checkValidStateRaw() on memory-mapped reads triggered
by the TrieReader navigation.
Combined: 16.6% write thread overhead vs 2.6% = 6.4x regression for
this code path.
## Impact
At 256,000 seekExact calls/sec (32 TPS × 20 docs/bulk × 400 segments),
this overhead causes:
- 1.9x per-document indexing latency (577µs vs 303µs)
- Search thread saturation under mixed workload (queries slow down due
to CPU contention)
- Ingestion stalls at 297k docs/tenant vs 600k+ on lucene90
Increasing refresh_interval from 1s to 30s (reducing segments from ~400
to ~13) mitigates the issue by reducing seekExact calls 30x, pushing
the stall point from 297k to 497k.
## Root Cause
Two compounding factors:
1. TrieReader replaces in-memory FST with on-disk trie navigation.
The FST was loaded into Java heap at segment open time — navigation
was pure CPU (BitTable.isBitSet). The TrieReader reads from
memory-mapped files, adding I/O indirection.
2. Each memory-mapped read triggers checkValidStateRaw() — the Panama
Foreign Memory API bounds check that verifies the Arena is still
open. This is called on every byte read from the mmap file.
The _id field is special: it is looked up via seekExact on every single
document indexed. It has a random access pattern (UUIDs) that does not
benefit from the TrieReader's sequential access optimizations.
## How to Reproduce
1. Create an index with many small segments (refresh_interval=1s,
continuous ingestion)
2. Bulk index documents with explicit _id (UUIDs)
3. Profile write threads with JFR
4. Compare seekExact time between lucene90 and lucene103 codecs
### Version and environment details
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]