[GitHub] [lucene-solr] dsmiley opened a new pull request #1166: SOLR-14040: shareSchema support for SolrCloud

GitBox Mon, 13 Jan 2020 21:13:02 -0800

dsmiley opened a new pull request #1166: SOLR-14040: shareSchema support for 
SolrCloud
URL: https://github.com/apache/lucene-solr/pull/1166
 
 
   https://issues.apache.org/jira/browse/SOLR-14040
   
   The essence of this change is that ConfigSetService does schema caching 
itself (optionally) instead of schema caching being done in a subclass.  The 
problem with a subclass was that subclassing was already used to differentiate 
SolrCloud from standalone, but we want both modes to have this ability.  
   
   A consideration I had was how a user could force a new schema to be loaded 
if for some reason the old one was obsoleted.  Perhaps it wasn't changed but it 
referenced some file.  Ultimately, users need to forcibly update the schema 
even if the schema file itself is identical. I'm leaving this as a manual 
operation, though it'd be nice if the user could do this through Solr's API 
with some new command that does not yet exist.
   
   The new test in ConfigSetsAPITest.testSharedSchema is admittedly pretty 
minimal but no more minimal than the existing test in 
TestCoreContainer.testSharedSchema.  I wasn't sure where to put the new test; 
TestCoreContainer isn't a SolrCloud test so I couldn't put it there.
   
   Some improvements:
   * Use Caffeine impl and weak values (to the schema).  Previously the cache 
never evicted!
   
   Some refactorings:
   * Renamed ConfigSetService.getConfig to loadConfigSet to more clearly show 
it loads a ConfigSet (and not SolrConfig).
   * In CloudConfigSetService.createCoreResourceLoader I moved the lookup of 
the configSet name from here to all the way out into CloudDescriptor (attached 
to CoreDescriptor) in its constructor.  The intention is to ensure that 
coreDescriptor.getConfigSet is correct in SolrCloud mode so that, for example, 
when we go to potentially cache the schema, we easily know to what configSet it 
is associated with.  Maybe this is the most contentious change, but it was very 
confusing during development of this overall issue to see this 
coreDescriptor.getConfigSet method which I was tempted to call but I had to 
keep reminding myself not to call it because it was empty in SolrCloud mode.  
Unfortunately, this makes creation of a core descriptor not a trivial creation 
from properties but it also takes in a ZkController which is used to do the 
lookup.  This must be done *somewhere* of course so it's not about perf but a 
conceptual conundrum in how we think of these core descriptor thingies.  I'm 
inclined to think that much of CoreDescriptor is flawed as written, 
particularly in a SolrCloud world where ZK is truth, not properties files from 
disk that may only reflect the state of things at the time the core was 
created. 
   * In CloudConfigSetService.createCoreResourceLoader there was a comment and 
some code: "// for back compat with cores that can create collections without 
the collections API".  I removed this as it appears to be impossible now; all 
tests passed with it gone.  (yay!)
   ** Probably related: There was a core.properties in the test path under a 
conf directory, which is really weird.  It appears no test truly uses it, yet 
nevertheless during debugging of the above I saw a test indirectly fail due to 
the presence of this file.
   * Changed static IndexSchemaFactory.getResourceNameToBeUsed to be an 
instance method on the IndexSchemaFactory instance so that the factory 
implementation could return a suitable answer.  The static method made 
assumptions based on the two known types which was not as clean.  Additionally 
it also used to actually look at the files on disk to get an accurate answer 
but I removed that because we can handle a reasonable guess response.  See, at 
this juncture I would rather avoid probing ZooKeeper over which files exist 
(managed-schema vs schema.xml) over this small matter.  It's really a wart IMO; 
the name ought to be deterministic.  Consequently someone wanting schema 
sharing and who is also using a managed schema must ensure their schema is 
named managed-schema (or whatever they configured it to be in solrconfig.xml).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley opened a new pull request #1166: SOLR-14040: shareSchema support for SolrCloud

Reply via email to