benwtrent opened a new pull request, #13169: URL: https://github.com/apache/lucene/pull/13169
13127 is failing due to seqNo being larger than `maxSeqNo` on `close()`. `maxSeqNo` is set during `DocumentsWriterDeleteQueue#advanceQueue`, synchronized on self. It utilizes `getLastSequenceNumber()`, which reads the `AtomicLong` for `seqNo`. However, `DocumentsWriterDeleteQueue#getNextSequenceNumber()` is not synchronized on self. Meaning after (or before) reading the `AtomicLong`, it could be incremented by this method. `DocumentsWriterDeleteQueue#getNextSequenceNumber()` is called in other synchronized methods, the one external one being `DocumentsWriter#getNextSequenceNumber`, which is synchronized on self (e.g. DocumentsWriter). When calling `DocumentsWriterFlushControl#markFullFlush`, neither the `DocumentsWriterDeleteQueue` nor `DocumentsWriterDeleteQueue` are locked, and it is possible for `documentsWriter.getNextSequenceNumber()` to be called after `documentsWriter.deleteQueue.advanceQueue` but before `documentsWriter.resetDeleteQueue`. This commit moves the `documentsWriter.deleteQueue.advanceQueue` call into the already synchronized `documentsWriter.resetDeleteQueue` to ensure that `getNextSequenceNumber` cannot be called while the queue is being reset. This is admittedly difficult to fully test. But I have seen many failures in continuous testing. So, if this commit does fix that test, I would expect those to disappear. closes: https://github.com/apache/lucene/issues/13127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org