Re: How to manage solr cloud collections-sharding?

Erick Erickson Mon, 14 Jan 2013 04:19:05 -0800

I can at least answer part of this....
see inline.


On Sun, Jan 13, 2013 at 11:44 AM, adfel70 <adfe...@gmail.com> wrote:

> Hi,
> I know a few question on this issue have already been posted, but I dint
> find full answers in any of those posts.
>
> I'm using solr-4.0.0
>
[EOE] I'd _really_ start working with a nightly build instead, there have
been a lot of improvements and the RC1 for 4.1 may well be cut this week.

I need my solr cluster to have multiple collections, each collection with
> different configuration (at least different schema.xml file).
> I follow the solrCloud tutorial page and execute this command:
> /java -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf -DzkRun -DnumShards=5 -jar start.jar/
> when I start a solr servers I have collection1 in clustserState.json with
> each node assigned to some shard.
>
> questions so far:
> 1.Is this first command 100% necessary?
>
[EOE] No, it's not. You could use the zkCli commands here:
http://wiki.apache.org/solr/SolrCloud#Command_Line_Util. See especially the
"try bootstrapping all the conf dirs in solr.xml" example. But sometime you
have to send all the relevant info to Zookeeper, configuration files etc.
This command is a convenience way to do that.

> 2. Do I have to defined the number of shards before starting solr
> instances?
>
[EOE] Yes, unless you're doing "custom sharding".

> 3. What if I want to add a shard after I started all solr instances and
> haven't indexed yet?
>
[EOE] Then you have to re-index currently, unless you are doing "custom
hashing"

> 4. what if I want to add a shard after indexing?
>
[EOE] You have to re-index (and reconfigure your ZK state) unless you're
doing "custom sharding"

> 5. what is the role that clustserState.json plays? is it just a json file
> to
> show in the GUI? Or is it the only file that persists the current state of
> the cluster?
>
[EOE] What it does on ZK I don't know, but I've only seen it used as
something for the GUI to read. Actually, all the other views are just
prettifying this file.

> 6. Can I edit it manually? should I?
>
[EOE]  I've never heard of anyone even wanting to, you'd have to ask
someone who knows way more about ZK than I do.

>
> I add another schema-B.xml file to the zookeeper and open another
> collection
> by using coreAdmin Rest API.
> I want this collection to have 10 shards and not 5 as I defined for the
> previous collection.
> So I run
> /http://server:port
> /solr/admin/cores?action=CREATE&name=coreX&instanceDir=path_to_instance_directory&config=config_file_name.xml&schema=schem_file_name.xml&dataDir=data&shard=shard//
> 10 times with different / each run.
>
> [EOE] Currently, by adding the shard= parameter, you're now doing custom
sharding. Mark just raised a JIRA about this recently, don't quite know
what the current status of this is. You're in kind of uncharted territory
here...

> questions:
> 1. is this an appropriate way to use the core admin API? should I specify
> the shard Id? I do it because it gives me a way to control the number of
> shards (each new shard id creates a new shard). but should I use it this
> way?
>
[EOE] Right, but currently this means you do NOT get automatic distributed
indexing, your indexing program has to send the document to the appropriate
shard, preferably the leader. Like I said, this is kind of new.

> 2. Can I have different number of shards in different collections on the
> same cluster?
> 3. If yes - then what is the purpose of the first bootstrap command?
>
> [EOE] Well, without some kind of bootstrap, how would _any_ cluster
information ever get to Zookeeper? The first bootstrap command is
essentially the "fire an forget" approach so you don't have to keep track
of anything to use SolrCloud.

>
> another question:
> I saw that in 4.1 version, each shard has another parameter - range. what
> is
>
[EOE] Haven't worked with this yet....

> this parameter used for? would I have to re-index when upgrading from 4.0
> to
> 4.1?
>
> [EOE] You shouldn't have to re-index

>
> this will help a lot in understanding the whole collection-sharding
> architecture in solr cloud.
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-manage-solr-cloud-collections-sharding-tp4033009.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: How to manage solr cloud collections-sharding?

Reply via email to