[ 
https://issues.apache.org/jira/browse/SOLR-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196916#comment-17196916
 ] 

Erick Erickson commented on SOLR-14151:
---------------------------------------

[~tflobbe]  See SOLR-14861. Specifically, "is buggy" amounts at least has this 
problem: When Corcontainer.shutdown is running, there's a variable "isShutdown" 
in CoreContainer that's set, and we check for that in various other places, 
specifically reload() but there are a number of other places scattered all 
through the code. The case Noble and I found was that CoreContainer.reload() 
checks this variable at the top and gets past it.

Then some other thread calls shutdown before the reload is done, and the 
reloading thread is time-sliced out and the shutdown code executes for a while. 
Then that thread is time-sliced out and the reload picks up, but by now the 
state of the container is such that the reload can't continue.

The problem manifested itself with unreleased object suite-level failures. The 
actual test succeeded. That said, there are certainly other ways this kind of 
thing could manifest itself. IDK whether the tests you mentioned have the same 
problem or not, but it'd be likely if the failures are unreleased objects. 

I attached a patch to that Jira that I started looking at (it's in horrible 
shape, but if I ever pick that Jira up again I wanted to have it handy to 
remember lessons learned about why this approach is probably bad) that tries to 
use a reentrant lock to make sure no other CoreContainer operations are not 
in-flight when we shutdown or load. It lead to a bunch of deadlocks.

Besides, that approach is all about CoreContainer operations, there are places 
outside CoreContainer that check CoreContainer.isShutdown that potentially have 
the same problem.

The particular scenario was that the test did something that caused a reload, 
_then immediately terminated._ which started the shutdown process so it's 
somewhat artificial. Even just putting a delay in the end of the test before it 
terminated the test class completely cured the problem for that particular 
test. Of course that's not a fix, but it is evidence for the diagnosis.

So basically I punted. Introducing the locking in CoreContainer has a lot of 
potential for deadlocks, besides when I saw the other parts of the code that 
tested CoreContainer.isShutdown I realized it's more widespread. Besides that, 
I'm not sure how important this is in production when weighed against the 
potential for deadlock, in this particular case it only manifested itself 
because the test was shutting down the so quickly.

I think we need a way for shutdown to somehow cause Solr to start refusing 
_all_ incoming requests, wait until all in-flight operations are complete, and 
then start shutting down. The approach in the patch is too local, even if it 
would work. I'd love suggestions here. And this is exacerbated by the fact that 
the test framework calls CoreContainer.shutdown() directly...

> Make schema components load from packages
> -----------------------------------------
>
>                 Key: SOLR-14151
>                 URL: https://issues.apache.org/jira/browse/SOLR-14151
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>            Priority: Major
>              Labels: packagemanager
>             Fix For: 8.7
>
>          Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Example:
> {code:xml}
>  <fieldType name="mytype1" class="pkg1:my.pkg.FieldTypeImpl">
>     <analyzer type="index">
>       <tokenizer class="pkg2:my.pkg2.MyTokenizerFactory"/>
>       <filter class="pkg2:my.pkg3.MyFilterFactory" generateWordParts="1" 
> generateNumberParts="0" catenateWords="0"
>               catenateNumbers="0" catenateAll="0"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.FlattenGraphFilterFactory"/>
>     </analyzer>
>   </fieldType>
> {code}
> * When a package is updated, the entire {{IndexSchema}} object is refreshed, 
> but the SolrCore object is not reloaded
> * Any component can be prefixed with the package name
> * The semantics of loading plugins remain the same as that of the components 
> in {{solrconfig.xml}}
> * Plugins can be registered using schema API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to