[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217280#comment-17217280
 ] 

Zach Chen commented on LUCENE-9508:
-----------------------------------

Looks like the logic in *markForFullFlush* has received some changes from 
https://issues.apache.org/jira/browse/LUCENE-9304, so not sure if back-porting 
/ upgrading would resolve this particular issue?

On the other hand, I did try playing with the latest code in master a bit and 
come up with the following test similar to what you described, and can still 
block the main thread executing *markForFullFlush.* However, I'm not sure if 
this test case is relevant / valid in actual production environment ? 
{code:java}
public void testBlockedFullFlush() throws IOException {
  try (Directory directory = newDirectory()) {
    try(IndexWriter writer = new IndexWriter(directory, new 
IndexWriterConfig())) {
      writer.addDocument(new Document());

      DocumentsWriterPerThreadPool pool = writer.docWriter.perThreadPool;
      assertEquals(1, pool.size());

      CountDownLatch latch = new CountDownLatch(1);
      Thread longLockingThread = new Thread(() -> {
        DocumentsWriterPerThread first = pool.getAndLock();

        DocumentsWriterPerThread second = pool.getAndLock();
        pool.marksAsFreeAndUnlock(second);

        assertEquals(2, pool.size());

        try {
          latch.await();
          pool.marksAsFreeAndUnlock(first);
        } catch (InterruptedException e) {
          e.printStackTrace();
        }
      });

      longLockingThread.start();
      writer.docWriter.flushControl.markForFullFlush();

      // Wont be able to reach this step as the line above blocked
      latch.countDown();
    }
  }
}

{code}
 

> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> ----------------------------------------------------------------
>
>                 Key: LUCENE-9508
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9508
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 8.5.1
>            Reporter: Sorabh Hamirwasia
>            Priority: Major
>              Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to