Re: SolrCloud Feedback

Jan Høydahl Mon, 14 Feb 2011 07:41:14 -0800

Some more comments:

f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even if 
they are related to embedded ZK
   -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 
-Dsolr.zkBootstrap_confdir=./solr/conf

g) I often share parts of my config between cores, e.g. a common schema.xml or 
synonyms.xml
   In the file-based mode I can thus use ../../common_conf/synonyms.xml or 
similar.
   I have not tried to bootstrap such a config into ZK but I assume it will not 
work
   ZK mode should support such a use case either by supporting notations like 
".."
   or by allowing an explicit zk name space: 
zk://configs/common-cfg/synonyms.xml

h) Support for dev / test / prod environments
   In real life you want to develop in one environment, test in another and run 
production in a third
   Thus, the ZK data structure should have a clear separation between logical 
feature configuration and
   physical deployment config.

   Perhaps a new level above /COLLECTIONS could be used to model this, e.g.
   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080
   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080
   /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080
   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080
   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090

/ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070

   When starting solr we may specify environment: -Dsolr.env=TEST (or configure 
a default)
   The main benefit is that we can maintain and store one single ZK config in 
our SCM,
   distribute the same configs to all servers, and if you like, point all envs 
to the same ZK ensemble.

   In the future, we can use this for automatic install of a new node as well:
   By simply adding a ZK entry on the right place, the node can discover "who 
it is" from ZK.

i) Ideally, no config inside conf should contain host names.
   My DIH config will most likely include server names, which will be different 
between TEST and PROD
   This could be solved as above, by letting the collection in TEST use another 
configName than PROD,
   but for some use cases, it might be more elegant to swap out a hardcoded 
string with a ZK node 
   in a generic way, such as jdbcString="my-hardcoded-string" to 
jdbcString="${zk://ENV/PROD/jdbcstrA}"

j) Question: Is ReplicationHandler ZK-aware yet?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 10. feb. 2011, at 16.10, Jan Høydahl wrote:

> Hi,
> 
> I have so far just tested the examples and got a N by M cluster running. My 
> feedback:
> 
> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly 
> state what is in which version, what are current improvement plans and get 
> rid of outdated stuff. That said I think there are many good ideas there.
> 
> b) The "collection" terminology is too much confused with "core", and should 
> probably be made more distinct. I just tried to configure two cores on the 
> same Solr instance into the same collection, and that worked fine, both as 
> distinct shards and as same shard (replica). The wiki examples give the 
> impression that "collection1" in 
> localhost:8983/solr/collection1/select?distrib=true is some magic collection 
> identifier, but what it really does is doing the query on the *core* named 
> "collection1", looking up what collection that core is part of and 
> distributing the query to all shards in that collection.
> 
> c) ZK is not designed to store large files. While the files in conf are 
> normally well below the 1M limit ZK imposes, we should perhaps consider using 
> a lightweight distributed object or k/v store for holding the /CONFIGS and 
> let ZK store a reference only
> 
> d) How are admins supposed to update configs in ZK? Install their favourite 
> ZK editor?
> 
> e) We should perhaps not be so afraid to make ZK a requirement for Solr in 
> v4. Ideally you should interact with a 1-node Solr in the same manner as you 
> do with a 100-node Solr. An example is the Admin GUI where the "schema" and 
> "solrconfig" links assume local file. This requires decent tool support to 
> make ZK interaction intuitive, such as "import" and "export" commands.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 19. jan. 2011, at 21.07, Mark Miller wrote:
> 
>> Hello Users,
>> 
>> About a little over a year ago, a few of us started working on what we 
>> called SolrCloud.
>> 
>> This initial bit of work was really a combination of laying some base work - 
>> figuring out how to integrate ZooKeeper with Solr in a limited way, dealing 
>> with some infrastructure - and picking off some low hanging search side 
>> fruit.
>> 
>> The next step is the indexing side. And we plan on starting to tackle that 
>> sometime soon.
>> 
>> But first - could you help with some feedback?ISome people are using our 
>> SolrCloud start - I have seen evidence of it ;) Some, even in production.
>> 
>> I would love to have your help in targeting what we now try and improve. Any 
>> suggestions or feedback? If you have sent this before, I/others likely 
>> missed it - send it again!
>> 
>> I know anyone that has used SolrCloud has some feedback. I know it because 
>> I've used it too ;) It's too complicated to setup still. There are still 
>> plenty of pain points. We accepted some compromise trying to fit into what 
>> Solr was, and not wanting to dig in too far before feeling things out and 
>> letting users try things out a bit. Thinking that we might be able to adjust 
>> Solr to be more in favor of SolrCloud as we go, what is the ideal state of 
>> the work we have currently done?
>> 
>> If anyone using SolrCloud helps with the feedback, I'll help with the coding 
>> effort.
>> 
>> - Mark Miller
>> -- lucidimagination.com
>

Re: SolrCloud Feedback

Reply via email to