dweiss commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1791471920
Well, this test is almost never "fast" for me... the conditions passed in Failure.eval are frequently called, but rarely hit the right call stack - this is particularly problematic with testCheckpoint - if I count the number of times the eval is called (for a particular random seed), it's 1201263, then nextInt(4) == 0 drops the call stack check to ~300k BUT the call stack check is successful only 50 times out of 299622 (and call stack collection is quite expensive overall). Anyway, for that particular seed you identified, @benwtrent , the index writer is simply hanging in shouldClose and never returns: ``` private synchronized boolean shouldClose(boolean waitForClose) { while (true) { if (closed == false) { if (closing == false) { // We get to close closing = true; return true; } else if (waitForClose == false) { return false; } else { // Another thread is presently trying to close; // wait until it finishes one way (closes // successfully) or another (fails to close) doWait(); } } else { return false; } } } ``` Nothing is happening in the test - it just idly waits until it times out. ``` "TEST-TestIndexWriterOnVMError.testUnknownError-seed#[4A059D04FCC8873]" #18 prio=5 os_prio=0 cpu=1453.12ms elapsed=211.66s tid=0x000001a54330c3d0 nid=0x3ce8 in Object.wait() [0x00000011de3fd000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(java.base@17.0.3/Native Method) - waiting on <0x00000000f67432c0> (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5419) - locked <0x00000000f67432c0> (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.IndexWriter.shouldClose(IndexWriter.java:1386) - locked <0x00000000f67432c0> (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2442) at org.apache.lucene.index.TestIndexWriterOnVMError.getTragedy(TestIndexWriterOnVMError.java:250) at org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:207) at org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:278) ... ``` The test has this code: ``` // TODO: remove rollback here, and add this assert to ensure "full OOM protection" anywhere IW // does writes // assertTrue("hit OOM but writer is still open, WTF: ", writer.isClosed()); try { writer.rollback(); } catch (Throwable t) { t.printStackTrace(log); } ``` And clearly that assertion would have fired, if enabled. I don't know how to fix this either though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org