ashish159357 opened a new pull request, #15330:
URL: https://github.com/apache/lucene/pull/15330

   Problem
   
    ByteBlockPool uses 32KB buffers with an integer offset tracker (  
byteOffset). When more than 65,535 buffers are allocated, integer overflow 
occurs in the  byteOffset calculation (byteOffset = bufferUpto * 
BYTE_BLOCK_SIZE), causing ArithmeticException during indexing of documents with 
large numbers of tokens.
   
   Root Cause
   - Each buffer is 32KB (BYTE_BLOCK_SIZE = 32768)
   - Maximum safe buffer count: Integer.MAX_VALUE / BYTE_BLOCK_SIZE = 65535
   - When bufferUpto >= 65535, the multiplication overflows
   
   Solution
   Implement proactive DWPT flushing when buffer count approaches the limit:
   1. Detection: Added isApproachingBufferLimit() method to detect when buffer 
count approaches the overflow threshold
   2. Propagation: Buffer limit status flows from  ByteBlockPool → 
IndexingChain →  DocumentsWriterPerThread → DocumentsWriterFlushControl
   3. Prevention: Force flush DWPT before overflow occurs, similar to existing 
RAM-based flushing.
   
   Key Changes
   - Added buffer limit detection in ByteBlockPool
   - Integrated check into DocumentsWriterFlushControl.doAfterDocument()
   - Uses threshold of 65,000 to provide safety margin before actual limit of 
65,535
   - Maintains existing performance characteristics while preventing crashes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to