benwtrent commented on issue #13127: URL: https://github.com/apache/lucene/issues/13127#issuecomment-1986293179
Looking at the code in `DocumentsWriterDeleteQueue#close()`, we trip if `seqNo` is ever larger than `maxSeqNo`. `maxSeqNo` is set in `DocumentsWriterDeleteQueue#advanceQueue(int)`, which is synchronized. Internally `maxSeqNo` is set to `getLastSequenceNumber() + maxNumPendingOps + 1;` From what I can see `maxNumPendingOps` is synchronized and unchanging as well as it is passed in via: `DocumentsWriterFlushControl#markForFullFlush`. However, `getLastSequenceNumber()` is NOT synchronized with `getNextSequenceNumber()`. It seems to me there may be a race condition where: - DocumentsWriterDeleteQueue#advanceQueue(int) is entered by one thread - The line `long seqNo = getLastSequenceNumber() + maxNumPendingOps + 1;` is executed - Then another thread calls `getNextSequenceNumber()` I have to trace up where all these are used, but this is the first thing I saw that seemed suspicious to me. If this happened, it seems possible to me that the new `DocumentsWriterDeleteQueue` returned from `advanceQueue` could have a `maxSeqNo` set to too few given a number of parallel calls to `getNextSequenceNumber()` during the `advanceQueue` action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org