benwtrent commented on issue #13127:
URL: https://github.com/apache/lucene/issues/13127#issuecomment-1986293179

   Looking at the code in `DocumentsWriterDeleteQueue#close()`, we trip if 
`seqNo` is ever larger than `maxSeqNo`. 
   
   `maxSeqNo` is set in `DocumentsWriterDeleteQueue#advanceQueue(int)`, which 
is synchronized. Internally `maxSeqNo` is set to `getLastSequenceNumber() + 
maxNumPendingOps + 1;`
   
   From what I can see `maxNumPendingOps` is synchronized and unchanging as 
well as it is passed in via: `DocumentsWriterFlushControl#markForFullFlush`. 
   
   However, `getLastSequenceNumber()` is NOT synchronized with 
`getNextSequenceNumber()`. It seems to me there may be a race condition where:
   
    - DocumentsWriterDeleteQueue#advanceQueue(int) is entered by one thread
    - The line `long seqNo = getLastSequenceNumber() + maxNumPendingOps + 1;` 
is executed
    - Then another thread calls `getNextSequenceNumber()`
   
   I have to trace up where all these are used, but this is the first thing I 
saw that seemed suspicious to me. If this happened, it seems possible to me 
that the new `DocumentsWriterDeleteQueue` returned from `advanceQueue` could 
have a `maxSeqNo` set to too few given a number of parallel calls to 
`getNextSequenceNumber()` during the `advanceQueue` action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to