[jira] [Commented] (LUCENE-3373) waitForMerges deadlocks if background merge fails
[ https://issues.apache.org/jira/browse/LUCENE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540544#comment-17540544 ] Thomas Hoffmann commented on LUCENE-3373: - Hello Vigya! Thank you for investigating this issue! As this thread was already quite old and I am not sure if this is the same problem which I encountered, I filed a new one. But we can focus on this thread of course. I also wrote some load tests with multiple threads and some loops to put some pressure on the IndexWriter object. Unfortunately, I couldn't reproduce the behaviour. As the same code worked for several years in our application, there must be some rare condition which must be met to get into this deadlock. You already found a possible situation, as far as I understood. As a temporary workaround we could disable the IO-Throttling of the ConcurrentMergeScheduler right? The IO-Throttling is activated by default as I can see in the sources. Maybe it is possible to reproduce the deadlock via breakpoint in the IDE and simulate/trigger the IO-Throttling? > waitForMerges deadlocks if background merge fails > - > > Key: LUCENE-3373 > URL: https://issues.apache.org/jira/browse/LUCENE-3373 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.3 >Reporter: Tim Smith >Priority: Major > > waitForMerges can deadlock if a merge fails for ConcurrentMergeScheduler > this is because the merge thread will die, but pending merges are still > available > normally, the merge thread will pick up the next merge once it finishes the > previous merge, but in the event of a merge exception, the pending work is > not resumed, but waitForMerges won't complete until all pending work is > complete > i worked around this by overriding doMerge() like so: > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try { > super.doMerge(merge); > } catch (Throwable exc) { > // Just logging the exception and not rethrowing > // insert logging code here > } > } > {code} > Here's the rough steps i used to reproduce this issue: > override doMerge like so > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try {Thread.sleep(500L);} catch (InterruptedException e) { } > super.doMerge(merge); > throw new IOException("fail"); > } > {code} > then, if you do the following: > loop 50 times: > addDocument // any doc > commit > waitForMerges // This will deadlock sometimes > SOLR-2017 may be related to this (stack trace for deadlock looked related) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.
mocobeta commented on PR #917: URL: https://github.com/apache/lucene/pull/917#issuecomment-1133835329 > Can you merge the changes from [script-testing-windows](https://github.com/dweiss/lucene/tree/script-testing-windows) branch and rerun your stress test? It emits more logging, let's see what's happening there. Sure - let me merge it later today. Also, I'd tune the timeout to 20 to 30 seconds to increase the failure probability (in most cases the test completes within 20 seconds; setting a small timeout larger than 20 seconds could be suitable for capturing unusual cases from my random trials so far). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.
mocobeta commented on PR #917: URL: https://github.com/apache/lucene/pull/917#issuecomment-1133846073 @dweiss I merged https://github.com/dweiss/lucene/tree/script-testing-windows into my branch. This repeatedly runs the test in 20 VMs, it's a waste of resources though. For debugging I set the timeout to 20 seconds. You can see the CI results here and please feel free to tweak the code and re-run jobs (I think you have write access on this fork). https://github.com/mocobeta/lucene/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.
mocobeta commented on PR #917: URL: https://github.com/apache/lucene/pull/917#issuecomment-1133858535 In a failed run, the thread does not hang but seems to be suspended several times. e.g.: https://github.com/mocobeta/lucene/runs/6542483446?check_suite_focus=true#step:6:1985 https://github.com/mocobeta/lucene/runs/6542483446?check_suite_focus=true#step:6:2056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.
mocobeta commented on PR #917: URL: https://github.com/apache/lucene/pull/917#issuecomment-1133866879 I re-run several times and tracked the debug messages in failed runs. In a short summary, 1. it takes at least about ten seconds to load AWT/Swing classes in the first test run on Windows VM 2. in the middle of loading classes, sometimes the thread is suspended several times very long time (five seconds or more) by the host machine or scheduler or something else. In worst cases, it could take minutes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.
mocobeta commented on PR #917: URL: https://github.com/apache/lucene/pull/917#issuecomment-1133877011 I finally hit the failure runs that exceed 120 seconds to complete. - https://github.com/mocobeta/lucene/runs/6543322816?check_suite_focus=true - https://github.com/mocobeta/lucene/runs/6543322574?check_suite_focus=true Process forking is not a problem at all, launching Luke involves loading many classes (it prepares all panels at starting... sorry) and that can take a long time on windows vm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.
dweiss commented on PR #917: URL: https://github.com/apache/lucene/pull/917#issuecomment-1133900543 Yeah, it looks like it! It is inexplicably slow!... One change I think we could try is to run the forked command with a higher priority (start command has an option for this; cmd doesn't, I believe). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10587) Rename "master seed" to "root seed" or "main seed" or so?
Michael McCandless created LUCENE-10587: --- Summary: Rename "master seed" to "root seed" or "main seed" or so? Key: LUCENE-10587 URL: https://issues.apache.org/jira/browse/LUCENE-10587 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I noticed that Lucene's test infrastructure (or perhaps it's in R{{{}andomizedTesting{}}} dependency?) still says things like this: {noformat} > [junit4:junit4] says Привет! Master seed: 3296009A5B3B7A05 {noformat} Let's rename away from the term {{{}master{}}}? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10587) Rename "master seed" to "root seed" or "main seed" or so?
[ https://issues.apache.org/jira/browse/LUCENE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540606#comment-17540606 ] Michael McCandless commented on LUCENE-10587: - Woops, my bad – this was based on old test output in old issues! These days we say this: {noformat} Running tests with randomization seed: tests.seed=7FB5EB33F1ED3689 {noformat} Perfect :) > Rename "master seed" to "root seed" or "main seed" or so? > - > > Key: LUCENE-10587 > URL: https://issues.apache.org/jira/browse/LUCENE-10587 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > I noticed that Lucene's test infrastructure (or perhaps it's in > R{{{}andomizedTesting{}}} dependency?) still says things like this: > {noformat} > > [junit4:junit4] says Привет! Master seed: 3296009A5B3B7A05 > > {noformat} > Let's rename away from the term {{{}master{}}}? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10587) Rename "master seed" to "root seed" or "main seed" or so?
[ https://issues.apache.org/jira/browse/LUCENE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-10587. - Resolution: Not A Problem > Rename "master seed" to "root seed" or "main seed" or so? > - > > Key: LUCENE-10587 > URL: https://issues.apache.org/jira/browse/LUCENE-10587 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > I noticed that Lucene's test infrastructure (or perhaps it's in > R{{{}andomizedTesting{}}} dependency?) still says things like this: > {noformat} > > [junit4:junit4] says Привет! Master seed: 3296009A5B3B7A05 > > {noformat} > Let's rename away from the term {{{}master{}}}? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10587) Rename "master seed" to "root seed" or "main seed" or so?
[ https://issues.apache.org/jira/browse/LUCENE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540614#comment-17540614 ] Dawid Weiss commented on LUCENE-10587: -- I think this message is still present in the ant task in randomized testing, actually. This particular word has no negative historical or emotional connotation to me but when I get to the code there, I'll modify it - costs me nothing and maybe it'll make somebody happier. > Rename "master seed" to "root seed" or "main seed" or so? > - > > Key: LUCENE-10587 > URL: https://issues.apache.org/jira/browse/LUCENE-10587 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > I noticed that Lucene's test infrastructure (or perhaps it's in > R{{{}andomizedTesting{}}} dependency?) still says things like this: > {noformat} > > [junit4:junit4] says Привет! Master seed: 3296009A5B3B7A05 > > {noformat} > Let's rename away from the term {{{}master{}}}? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10587) Rename "master seed" to "root seed" or "main seed" or so?
[ https://issues.apache.org/jira/browse/LUCENE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540621#comment-17540621 ] Michael McCandless commented on LUCENE-10587: - Ahh OK thanks [~dweiss]! > Rename "master seed" to "root seed" or "main seed" or so? > - > > Key: LUCENE-10587 > URL: https://issues.apache.org/jira/browse/LUCENE-10587 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > I noticed that Lucene's test infrastructure (or perhaps it's in > R{{{}andomizedTesting{}}} dependency?) still says things like this: > {noformat} > > [junit4:junit4] says Привет! Master seed: 3296009A5B3B7A05 > > {noformat} > Let's rename away from the term {{{}master{}}}? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #913: Lucene 10577
msokolov commented on PR #913: URL: https://github.com/apache/lucene/pull/913#issuecomment-1133979777 I ran luceneutil (after adding support for this new encoding, and scaling the vectors used on the build the candidate index and queries) and got decent results: Task QPS | baseline | StdDev | candidate | StdDev | Pct diff | p-value --- | ---: | --- | -: | --- | --- | - : PKLookup | 133.47 | (18.7%) | 130.73 | (24.6%) | -2.0% ( -38% - 50%) | 0.767 LowTermVector | 1024.29 | (15.6%) |1140.88 | (16.3%) | 11.4% ( -17% - 51%) | 0.024 MedTermVector | 727.57 | (8.6%)| 850.54 | (14.0%) | 16.9% ( -5% - 43%) | 0.000 AndHighMedVector | 934.28 |(8.9%) |1105.48 | (15.0%) | 18.3% ( -5% - 46%) | 0.000 AndHighHighVector| 789.04 | (11.1%) | 947.74 | (16.1%) | 20.1% ( -6% - 53%) | 0.000 AndHighLowVector | 910.39 | (13.2%) |1219.95 | (22.2%) | 34.0% ( -1% - 79%) | 0.000 HighTermVector | 674.78 |(8.8%) | 915.99 | (17.9%) | 35.7% ( 8% - 68%) | 0.000 I have no explanation for the drop in PKLookup -- I checked the two indexes and they have the same named files, all the same sizes except for the .vec and .vem files, as expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3373) waitForMerges deadlocks if background merge fails
[ https://issues.apache.org/jira/browse/LUCENE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540701#comment-17540701 ] Vigya Sharma commented on LUCENE-3373: -- I doubt if disabling ConcurrentMergeScheduler throttling would help here, as the code path I mentioned, gets triggered regardless of whether throttling is enabled or not. Separately, I don't have a good feeling about the whole unbounded wait in IndexWriter shutdown(). I guess it was written this way, because we don't seem to have a good time upper bound imposed on running merges (although merges are transactional, such that aborting them midway would only be wasted work, but not cause any corruption as far as I know). I think we should have a timeout on this wait, at least on the wait for pendingMerges. If we are done with runningMerges but still have pending merges left, we could exit from this thread and abort them (the same way we [abort|https://github.com/apache/lucene/blob/d17c6056d8caada6db6c1c4f280f54960e058ee2/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2438] in {{rollbackInternalNoCommit()}} ). This should be an easy fix. I can take this up if Lucene experts in this area think this is a good idea. > waitForMerges deadlocks if background merge fails > - > > Key: LUCENE-3373 > URL: https://issues.apache.org/jira/browse/LUCENE-3373 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.3 >Reporter: Tim Smith >Priority: Major > > waitForMerges can deadlock if a merge fails for ConcurrentMergeScheduler > this is because the merge thread will die, but pending merges are still > available > normally, the merge thread will pick up the next merge once it finishes the > previous merge, but in the event of a merge exception, the pending work is > not resumed, but waitForMerges won't complete until all pending work is > complete > i worked around this by overriding doMerge() like so: > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try { > super.doMerge(merge); > } catch (Throwable exc) { > // Just logging the exception and not rethrowing > // insert logging code here > } > } > {code} > Here's the rough steps i used to reproduce this issue: > override doMerge like so > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try {Thread.sleep(500L);} catch (InterruptedException e) { } > super.doMerge(merge); > throw new IOException("fail"); > } > {code} > then, if you do the following: > loop 50 times: > addDocument // any doc > commit > waitForMerges // This will deadlock sometimes > SOLR-2017 may be related to this (stack trace for deadlock looked related) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3373) waitForMerges deadlocks if background merge fails
[ https://issues.apache.org/jira/browse/LUCENE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540702#comment-17540702 ] Vigya Sharma commented on LUCENE-3373: -- I'm curious why we need to {{break}} [here|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L539-L541] if we realize we don't want to stall. Why can't we just let the thread continue with scheduling all the other merges. I also realized that we could exit from that while loop without scheduling all pending merges, if the thread ran into any exceptions. That is another way that could cause an endless wait. > waitForMerges deadlocks if background merge fails > - > > Key: LUCENE-3373 > URL: https://issues.apache.org/jira/browse/LUCENE-3373 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.3 >Reporter: Tim Smith >Priority: Major > > waitForMerges can deadlock if a merge fails for ConcurrentMergeScheduler > this is because the merge thread will die, but pending merges are still > available > normally, the merge thread will pick up the next merge once it finishes the > previous merge, but in the event of a merge exception, the pending work is > not resumed, but waitForMerges won't complete until all pending work is > complete > i worked around this by overriding doMerge() like so: > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try { > super.doMerge(merge); > } catch (Throwable exc) { > // Just logging the exception and not rethrowing > // insert logging code here > } > } > {code} > Here's the rough steps i used to reproduce this issue: > override doMerge like so > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try {Thread.sleep(500L);} catch (InterruptedException e) { } > super.doMerge(merge); > throw new IOException("fail"); > } > {code} > then, if you do the following: > loop 50 times: > addDocument // any doc > commit > waitForMerges // This will deadlock sometimes > SOLR-2017 may be related to this (stack trace for deadlock looked related) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10586) Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, indexMetaIn, termsMetaIn
[ https://issues.apache.org/jira/browse/LUCENE-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540708#comment-17540708 ] Michael McCandless commented on LUCENE-10586: - +1 > Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, > indexMetaIn, termsMetaIn > --- > > Key: LUCENE-10586 > URL: https://issues.apache.org/jira/browse/LUCENE-10586 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Trivial > > Those three local variables refer to the same {{IndexInput}} object (no > clone() is called). > {code} > indexMetaIn = termsMetaIn = metaIn; > {code} > I'm not sure but maybe there are some historical reasons. I wonder if it > would be better to have only one reference for the underlying {{IndexInput}} > object to make it a little easy to follow the code. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.
mocobeta commented on PR #917: URL: https://github.com/apache/lucene/pull/917#issuecomment-1134066495 It could be worth trying to load GUI components lazily... starting Luke takes seconds even on a physical machine and a quicker launch is also good for humans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie opened a new pull request, #919: Update dev-docs
shaie opened a new pull request, #919: URL: https://github.com/apache/lucene/pull/919 # Description While browsing `dev-docs` I've noticed some leftovers for Solr instructions (which I guess existed before the split) as well some references to `master`. # Solution Updated the docs. # Tests `./gradlew check` # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3373) waitForMerges deadlocks if background merge fails
[ https://issues.apache.org/jira/browse/LUCENE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540795#comment-17540795 ] Thomas Hoffmann commented on LUCENE-3373: - The stalling depends on maxMergeCount, as far as I can see. Maybe it should be worth logging a warning (instead of verbose). Thus it would be more visible if it happens again. An unbounded wait should be avoided for sure. Some logic about a maximum wait time could help (as a workaround to avoid an infinite wait). This event should be logged to catch the real reason and find the root cause. > waitForMerges deadlocks if background merge fails > - > > Key: LUCENE-3373 > URL: https://issues.apache.org/jira/browse/LUCENE-3373 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.3 >Reporter: Tim Smith >Priority: Major > > waitForMerges can deadlock if a merge fails for ConcurrentMergeScheduler > this is because the merge thread will die, but pending merges are still > available > normally, the merge thread will pick up the next merge once it finishes the > previous merge, but in the event of a merge exception, the pending work is > not resumed, but waitForMerges won't complete until all pending work is > complete > i worked around this by overriding doMerge() like so: > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try { > super.doMerge(merge); > } catch (Throwable exc) { > // Just logging the exception and not rethrowing > // insert logging code here > } > } > {code} > Here's the rough steps i used to reproduce this issue: > override doMerge like so > {code} > protected final void doMerge(MergePolicy.OneMerge merge) throws IOException > { > try {Thread.sleep(500L);} catch (InterruptedException e) { } > super.doMerge(merge); > throw new IOException("fail"); > } > {code} > then, if you do the following: > loop 50 times: > addDocument // any doc > commit > waitForMerges // This will deadlock sometimes > SOLR-2017 may be related to this (stack trace for deadlock looked related) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org