[ https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546696#comment-17546696 ]
Vigya Sharma commented on LUCENE-10583: --------------------------------------- I was discussing this with [~mikemccand], and we noticed that the application seems to have a lock on the {{MMapDirectory}}, which is likely preventing the merge thread from cleaning up files (in {{maybeDeletePendingFiles()}}). {code:java} at com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723) - locked <0x00000006d5c00208> (a org.apache.lucene.store.MMapDirectory) at com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142) {code} Can you check if {{ebs.module.search.SearchService.updateSearchIndex()}} is synchronizing on {{Directory}}, or locking any other lucene object? Lucene holds locks on different objects at multiple different places. Locking them in the calling application code can easily lead to deadlocks. Let's make sure that's not happening here. > Deadlock with MMapDirectory while waitForMerges > ----------------------------------------------- > > Key: LUCENE-10583 > URL: https://issues.apache.org/jira/browse/LUCENE-10583 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 8.11.1 > Environment: Java 17 > OS: Windows 2016 > Reporter: Thomas Hoffmann > Priority: Critical > > Hello, > a deadlock situation happened in our application. We are using MMapDirectory > on Windows 2016 and got the following stacktrace: > {code:java} > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > elapsed=81248.18s tid=0x000000002860af10 nid=0x237c in Object.wait() > [0x00000000413fc000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(java.base@17.0.2/Native Method) > - waiting on <no object reference available> > at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983) > - locked <0x00000006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at > org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697) > - locked <0x00000006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278) > at > com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723) > - locked <0x00000006d5c00208> (a org.apache.lucene.store.MMapDirectory) > at > com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142) > ...{code} > All threads were waiting to lock <0x00000006d5c00208> which got never > released. > A lucene thread was also blocked, I dont know if this is relevant: > {code:java} > "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms > elapsed=3499.07s tid=0x00000000459453e0 nid=0x1f8 waiting for monitor entry > [0x000000005da9e000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346) > - waiting to lock <0x00000006d5c00208> (a > org.apache.lucene.store.MMapDirectory) > at > org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363) > at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.<init>(CompressingStoredFieldsWriter.java:121) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130) > at > org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) > at > org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code} > If looks like the merge operation never finished and released the lock. > Is there any option to prevent this deadlock or how to investigate it further? > A load-test didn't show this problem unfortunately. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org