[ https://issues.apache.org/jira/browse/SOLR-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196916#comment-17196916 ]
Erick Erickson commented on SOLR-14151: --------------------------------------- [~tflobbe] See SOLR-14861. Specifically, "is buggy" amounts at least has this problem: When Corcontainer.shutdown is running, there's a variable "isShutdown" in CoreContainer that's set, and we check for that in various other places, specifically reload() but there are a number of other places scattered all through the code. The case Noble and I found was that CoreContainer.reload() checks this variable at the top and gets past it. Then some other thread calls shutdown before the reload is done, and the reloading thread is time-sliced out and the shutdown code executes for a while. Then that thread is time-sliced out and the reload picks up, but by now the state of the container is such that the reload can't continue. The problem manifested itself with unreleased object suite-level failures. The actual test succeeded. That said, there are certainly other ways this kind of thing could manifest itself. IDK whether the tests you mentioned have the same problem or not, but it'd be likely if the failures are unreleased objects. I attached a patch to that Jira that I started looking at (it's in horrible shape, but if I ever pick that Jira up again I wanted to have it handy to remember lessons learned about why this approach is probably bad) that tries to use a reentrant lock to make sure no other CoreContainer operations are not in-flight when we shutdown or load. It lead to a bunch of deadlocks. Besides, that approach is all about CoreContainer operations, there are places outside CoreContainer that check CoreContainer.isShutdown that potentially have the same problem. The particular scenario was that the test did something that caused a reload, _then immediately terminated._ which started the shutdown process so it's somewhat artificial. Even just putting a delay in the end of the test before it terminated the test class completely cured the problem for that particular test. Of course that's not a fix, but it is evidence for the diagnosis. So basically I punted. Introducing the locking in CoreContainer has a lot of potential for deadlocks, besides when I saw the other parts of the code that tested CoreContainer.isShutdown I realized it's more widespread. Besides that, I'm not sure how important this is in production when weighed against the potential for deadlock, in this particular case it only manifested itself because the test was shutting down the so quickly. I think we need a way for shutdown to somehow cause Solr to start refusing _all_ incoming requests, wait until all in-flight operations are complete, and then start shutting down. The approach in the patch is too local, even if it would work. I'd love suggestions here. And this is exacerbated by the fact that the test framework calls CoreContainer.shutdown() directly... > Make schema components load from packages > ----------------------------------------- > > Key: SOLR-14151 > URL: https://issues.apache.org/jira/browse/SOLR-14151 > Project: Solr > Issue Type: Sub-task > Reporter: Noble Paul > Assignee: Noble Paul > Priority: Major > Labels: packagemanager > Fix For: 8.7 > > Time Spent: 12h 40m > Remaining Estimate: 0h > > Example: > {code:xml} > <fieldType name="mytype1" class="pkg1:my.pkg.FieldTypeImpl"> > <analyzer type="index"> > <tokenizer class="pkg2:my.pkg2.MyTokenizerFactory"/> > <filter class="pkg2:my.pkg3.MyFilterFactory" generateWordParts="1" > generateNumberParts="0" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.FlattenGraphFilterFactory"/> > </analyzer> > </fieldType> > {code} > * When a package is updated, the entire {{IndexSchema}} object is refreshed, > but the SolrCore object is not reloaded > * Any component can be prefixed with the package name > * The semantics of loading plugins remain the same as that of the components > in {{solrconfig.xml}} > * Plugins can be registered using schema API -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org