dweiss commented on issue #12654:
URL: https://github.com/apache/lucene/issues/12654#issuecomment-1791471920

   Well, this test is almost never "fast" for me... the conditions passed in 
Failure.eval are frequently called, but rarely hit the right call stack - this 
is particularly problematic with testCheckpoint - if I count the number of 
times the eval is called (for a particular random seed), it's 1201263, then 
nextInt(4) == 0 drops the call stack check to ~300k BUT the call stack check is 
successful only 50 times out of 299622 (and call stack collection is quite 
expensive overall). 
   
   Anyway, for that particular seed you identified, @benwtrent , the index 
writer is simply hanging in shouldClose and never returns:
   ```
     private synchronized boolean shouldClose(boolean waitForClose) {
       while (true) {
         if (closed == false) {
           if (closing == false) {
             // We get to close
             closing = true;
             return true;
           } else if (waitForClose == false) {
             return false;
           } else {
             // Another thread is presently trying to close;
             // wait until it finishes one way (closes
             // successfully) or another (fails to close)
             doWait();
           }
         } else {
           return false;
         }
       }
     }
   ```
   
   Nothing is happening in the test - it just idly waits until it times out.
   ```
   "TEST-TestIndexWriterOnVMError.testUnknownError-seed#[4A059D04FCC8873]" #18 
prio=5 os_prio=0 cpu=1453.12ms elapsed=211.66s tid=0x000001a54330c3d0 
nid=0x3ce8 in Object.wait()  [0x00000011de3fd000]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(java.base@17.0.3/Native Method)
        - waiting on <0x00000000f67432c0> (a 
org.apache.lucene.index.IndexWriter)
        at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5419)
        - locked <0x00000000f67432c0> (a org.apache.lucene.index.IndexWriter)
        at 
org.apache.lucene.index.IndexWriter.shouldClose(IndexWriter.java:1386)
        - locked <0x00000000f67432c0> (a org.apache.lucene.index.IndexWriter)
        at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2442)
        at 
org.apache.lucene.index.TestIndexWriterOnVMError.getTragedy(TestIndexWriterOnVMError.java:250)
        at 
org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:207)
        at 
org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:278)
 
   ...
   ```
   
   The test has this code:
   ```
         // TODO: remove rollback here, and add this assert to ensure "full OOM 
protection" anywhere IW
         // does writes
         // assertTrue("hit OOM but writer is still open, WTF: ", 
writer.isClosed());
         try {
           writer.rollback();
         } catch (Throwable t) {
           t.printStackTrace(log);
         }
   ```
   
   And clearly that assertion would have fired, if enabled. I don't know how to 
fix this either though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to