[ https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217280#comment-17217280 ]
Zach Chen commented on LUCENE-9508: ----------------------------------- Looks like the logic in *markForFullFlush* has received some changes from https://issues.apache.org/jira/browse/LUCENE-9304, so not sure if back-porting / upgrading would resolve this particular issue? On the other hand, I did try playing with the latest code in master a bit and come up with the following test similar to what you described, and can still block the main thread executing *markForFullFlush.* However, I'm not sure if this test case is relevant / valid in actual production environment ? {code:java} public void testBlockedFullFlush() throws IOException { try (Directory directory = newDirectory()) { try(IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig())) { writer.addDocument(new Document()); DocumentsWriterPerThreadPool pool = writer.docWriter.perThreadPool; assertEquals(1, pool.size()); CountDownLatch latch = new CountDownLatch(1); Thread longLockingThread = new Thread(() -> { DocumentsWriterPerThread first = pool.getAndLock(); DocumentsWriterPerThread second = pool.getAndLock(); pool.marksAsFreeAndUnlock(second); assertEquals(2, pool.size()); try { latch.await(); pool.marksAsFreeAndUnlock(first); } catch (InterruptedException e) { e.printStackTrace(); } }); longLockingThread.start(); writer.docWriter.flushControl.markForFullFlush(); // Wont be able to reach this step as the line above blocked latch.countDown(); } } } {code} > DocumentsWriter doesn't check for BlockedFlushes in stall mode`` > ---------------------------------------------------------------- > > Key: LUCENE-9508 > URL: https://issues.apache.org/jira/browse/LUCENE-9508 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 8.5.1 > Reporter: Sorabh Hamirwasia > Priority: Major > Labels: IndexWriter > > Hi, > I was investigating an issue where the memory usage by a single Lucene > IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the > memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, > this case ~3GB). So ideally memory usage should not go above that limit. I > looked into the heap dump and found that the fullFlush thread when enters > *markForFullFlush* method, it tries to take lock on the ThreadStates of all > the DWPT thread sequentially. If lock on one of the ThreadState is blocked > then it will block indefinitely. This is what happened in my case, where one > of the DWPT thread was stuck in indexing process. Due to this fullFlush > thread was unable to populate the flush queue even though the stall mode was > detected. This caused the new indexing request which came on indexing thread > to continue after sleeping for a second, and continue with indexing. In > **preUpdate()** method it looks for the stalled case and see if there is any > pending flushes (based on flush queue), if not then sleep and continue. > Question: > 1) Should **preUpdate** look into the blocked flushes information as well > instead of just flush queue ? > 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates > ? Since single blocking writing thread can block the full flush here. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org