[ 
https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571691#comment-17571691
 ] 

Vigya Sharma commented on LUCENE-10583:
---------------------------------------

{quote}We could perhaps make a best effort to detect on common incoming APIs 
that external locks are not already held on {{Directory}} and 
{{{}IndexWriter{}}}?
{quote}
Interesting thought. I like the idea of safe-guarding users against such 
errors, but I don't have a good practical solution for it yet.

We could assert that common lucene objects are lock free at some popular 
{{public}} entry points; but how do we differentiate on whether the lock is 
acquired by an internal lucene thread or an external user thread..? We do lock 
on IndexWriter at multiple places within lucene.
{quote}can this be resolved now?
{quote}
We added doc strings at a couple of places to warn users, and the user who 
reported this issue is unblocked. I don't have a concrete plan for anything 
else we can do here. Unless there are more ideas, we could go ahead and resolve 
this.

 

> Deadlock with MMapDirectory while waitForMerges
> -----------------------------------------------
>
>                 Key: LUCENE-10583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10583
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 8.11.1
>         Environment: Java 17
> OS: Windows 2016
>            Reporter: Thomas Hoffmann
>            Priority: Minor
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hello,
> a deadlock situation happened in our application. We are using MMapDirectory 
> on Windows 2016 and got the following stacktrace:
> {code:java}
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> elapsed=81248.18s tid=0x000000002860af10 nid=0x237c in Object.wait()  
> [0x00000000413fc000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@17.0.2/Native Method)
>     - waiting on <no object reference available>
>     at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983)
>     - locked <0x00000006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697)
>     - locked <0x00000006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236)
>     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278)
>     at 
> com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723)
>     - locked <0x00000006d5c00208> (a org.apache.lucene.store.MMapDirectory)
>     at 
> com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142)
> ...{code}
> All threads were waiting to lock <0x00000006d5c00208> which got never 
> released.
> A lucene thread was also blocked, I dont know if this is relevant:
> {code:java}
> "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms 
> elapsed=3499.07s tid=0x00000000459453e0 nid=0x1f8 waiting for monitor entry  
> [0x000000005da9e000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346)
>     - waiting to lock <0x00000006d5c00208> (a 
> org.apache.lucene.store.MMapDirectory)
>     at 
> org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363)
>     at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248)
>     at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289)
>     at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.<init>(CompressingStoredFieldsWriter.java:121)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)
>     at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code}
> If looks like the merge operation never finished and released the lock.
> Is there any option to prevent this deadlock or how to investigate it further?
> A load-test didn't show this problem unfortunately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to