dsmiley opened a new pull request #1166: SOLR-14040: shareSchema support for SolrCloud URL: https://github.com/apache/lucene-solr/pull/1166 https://issues.apache.org/jira/browse/SOLR-14040 The essence of this change is that ConfigSetService does schema caching itself (optionally) instead of schema caching being done in a subclass. The problem with a subclass was that subclassing was already used to differentiate SolrCloud from standalone, but we want both modes to have this ability. A consideration I had was how a user could force a new schema to be loaded if for some reason the old one was obsoleted. Perhaps it wasn't changed but it referenced some file. Ultimately, users need to forcibly update the schema even if the schema file itself is identical. I'm leaving this as a manual operation, though it'd be nice if the user could do this through Solr's API with some new command that does not yet exist. The new test in ConfigSetsAPITest.testSharedSchema is admittedly pretty minimal but no more minimal than the existing test in TestCoreContainer.testSharedSchema. I wasn't sure where to put the new test; TestCoreContainer isn't a SolrCloud test so I couldn't put it there. Some improvements: * Use Caffeine impl and weak values (to the schema). Previously the cache never evicted! Some refactorings: * Renamed ConfigSetService.getConfig to loadConfigSet to more clearly show it loads a ConfigSet (and not SolrConfig). * In CloudConfigSetService.createCoreResourceLoader I moved the lookup of the configSet name from here to all the way out into CloudDescriptor (attached to CoreDescriptor) in its constructor. The intention is to ensure that coreDescriptor.getConfigSet is correct in SolrCloud mode so that, for example, when we go to potentially cache the schema, we easily know to what configSet it is associated with. Maybe this is the most contentious change, but it was very confusing during development of this overall issue to see this coreDescriptor.getConfigSet method which I was tempted to call but I had to keep reminding myself not to call it because it was empty in SolrCloud mode. Unfortunately, this makes creation of a core descriptor not a trivial creation from properties but it also takes in a ZkController which is used to do the lookup. This must be done *somewhere* of course so it's not about perf but a conceptual conundrum in how we think of these core descriptor thingies. I'm inclined to think that much of CoreDescriptor is flawed as written, particularly in a SolrCloud world where ZK is truth, not properties files from disk that may only reflect the state of things at the time the core was created. * In CloudConfigSetService.createCoreResourceLoader there was a comment and some code: "// for back compat with cores that can create collections without the collections API". I removed this as it appears to be impossible now; all tests passed with it gone. (yay!) ** Probably related: There was a core.properties in the test path under a conf directory, which is really weird. It appears no test truly uses it, yet nevertheless during debugging of the above I saw a test indirectly fail due to the presence of this file. * Changed static IndexSchemaFactory.getResourceNameToBeUsed to be an instance method on the IndexSchemaFactory instance so that the factory implementation could return a suitable answer. The static method made assumptions based on the two known types which was not as clean. Additionally it also used to actually look at the files on disk to get an accurate answer but I removed that because we can handle a reasonable guess response. See, at this juncture I would rather avoid probing ZooKeeper over which files exist (managed-schema vs schema.xml) over this small matter. It's really a wart IMO; the name ought to be deterministic. Consequently someone wanting schema sharing and who is also using a managed schema must ensure their schema is named managed-schema (or whatever they configured it to be in solrconfig.xml).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org