[ 
https://issues.apache.org/jira/browse/LUCENE-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248133#comment-17248133
 ] 

Michael Froh edited comment on LUCENE-9621 at 12/11/20, 7:21 PM:
-----------------------------------------------------------------

Regarding the assertion failure, it looks like the call to 
{{adjustPendingNumDocs}} in {{rollbackInternalNoCommit}} is being call with 0 
(as both {{totalMaxDoc}} and {{rollbackMaxDoc}} are both 0).

It feels to me like when we roll back on tragedy, the {{IndexWriter}} is known 
to be in a bad state, so it's not really surprising that {{pendingNumDocs}} and 
{{segmentInfos.totalMaxDoc()}} are out of sync. Maybe the fix is to skip that 
assertion when called from {{maybeCloseOnTragicEvent}}, so that it doesn't mask 
the real tragedy?


was (Author: msfroh):
Regarding the assertion failure, it looks like the call to 
{{adjustPendingNumDocs}} in {{rollbackInternalNoCommit}} is being call with 0 
(as both {{totalMaxDoc}} and {{rollbackMaxDoc}} are both 0).

It feels to me like when we roll back on tragedy, the {{IndexWriter}} is known 
to be in a bad state, so it's not really surprising that {{pendingNumDocs}} and 
{{segmentInfos.totalMaxDoc()}} are out of sync. Maybe the fix is to skip that 
assertion when called from {{maybeCloseOnTragicEvent, so that it doesn't mask 
the real tragedy?}}

> pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()
> --------------------------------------------------------------
>
>                 Key: LUCENE-9621
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9621
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 8.6.3
>            Reporter: Michael Froh
>            Priority: Major
>
> While implementing a test to trigger an OutOfMemoryError on flush() in 
> https://github.com/apache/lucene-solr/pull/2088, I noticed that the OOME was 
> followed by an assertion failure on rollback with the following stacktrace:
> {code:java}
> java.lang.AssertionError: pendingNumDocs 1 != 0 totalMaxDoc
>       at 
> __randomizedtesting.SeedInfo.seed([ABBF17C4E0FCDEE5:DDC8E99910AFC8FF]:0)
>       at 
> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2398)
>       at 
> org.apache.lucene.index.IndexWriter.maybeCloseOnTragicEvent(IndexWriter.java:5196)
>       at 
> org.apache.lucene.index.IndexWriter.tragicEvent(IndexWriter.java:5186)
>       at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3932)
>       at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3874)
>       at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3853)
>       at 
> org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllRepeated(TestIndexWriterDelete.java:496)
> {code}
> We should probably look into how exactly we behave with this kind of tragedy 
> on flush().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to