On Mon, Aug 17, 2015 at 8:00 PM, Sathiya N Sundararajan
<ausat...@gmail.com> wrote:
> Folks:
>
> Question regarding SolrCloud Shard Number (Ex: shard<x>) & associated hash
> ranges. We are in the process of identifying the best strategy to merge
> shards that belong to collections that are chronologically older which sees
> very low volume of searches compared to the collections with most recent
> data.
>
> What we ran into is that often times we find that Shard numbers are hash
> ranges don’t necessarily correlate:
>
> shard1: 80000000-aaa9ffff
> shard2: aaaa0000-d554ffff
> shard3: d5550000-ffffffff ( holds the last range )
> shard4: 0-2aa9ffff ( holds the starting range )
> shard5: 2aaa0000-5554ffff
> shard6: 55550000-7fffffff


It's not really clear what you mean by "correlate"... but I think
there are 2 different points to make:
1) This is the hex representation of a signed integer, so 80000000 is
the start of the complete hash range, and 7fffffff is the end.
2) The numbers in shard1, shard2, etc, names are meaningless... just
names like shard_foo and shard_bar.  They do not need to be ordered in
any way with respect to each other.

-Yonik

> same goes for 'core_node<x>’ that does not follow order neither it
> correlates with shard<x>. Meaning core_node<1> does not contain the keys
> starting from 0 nor does it map to shard<1>.
>
> {"shard1"=>
>   {"range"=>"80000000-aaa9ffff",
>     {"core_node5"=>
>       "core"=>"post_NW_201508_shard1_replica1",
>   "shard2"=>
>     {"range"=>"aaaa0000-d554ffff",
>       {"core_node6"=>
>         "core"=>"post_NW_201508_shard2_replica1",
>   "shard3"=>
>     {"range"=>"d5550000-ffffffff",
>       {"core_node2"=>
>         "core"=>"post_NW_201508_shard3_replica1",
>   "shard4"=>
>     {"range"=>"0-2aa9ffff",
>       {"core_node3"=>
>         "core"=>"post_NW_201508_shard4_replica1",
>   "shard5"=>
>     {"range"=>"2aaa0000-5554ffff",
>       {"core_node4"=>
>         "core"=>"post_NW_201508_shard5_replica1",
>   "shard6"=>
>     {"range"=>"55550000-7fffffff",
>       {"core_node1"=>
>         "core"=>"post_NW_201508_shard6_replica1"
>
>
> Why would this be a concern ?
>
>    1. Lets say if we merge the indexes of adjacent shards (to reduce the
>    number of shards in the collection). In this case it will be merging
>    "core_node3: 0-2aa9ffff” & "core_node4: 2aaa0000-5554ffff” . What would the
>    index of the new core_node directory ? core_node<?>
>    2. When we copy this data over to the cluster after recreating the
>    collection with reduced number of shards, how would the cluster infer the
>    hash range from the index data or how does it reconcile with the metadata
>    about the shards in the local filesystem of cluster nodes.
>    3. How should we approach this problem to guarantee Solr picks up the
>    right key order from the merged indexes ?
>
>
>
> *Solr 4.4*
> *HDFS for Index Storage*

Reply via email to