Great thanks Yonik.

On Mon, Aug 17, 2015 at 5:16 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Mon, Aug 17, 2015 at 8:00 PM, Sathiya N Sundararajan
> <ausat...@gmail.com> wrote:
> > Folks:
> >
> > Question regarding SolrCloud Shard Number (Ex: shard<x>) & associated
> hash
> > ranges. We are in the process of identifying the best strategy to merge
> > shards that belong to collections that are chronologically older which
> sees
> > very low volume of searches compared to the collections with most recent
> > data.
> >
> > What we ran into is that often times we find that Shard numbers are hash
> > ranges don’t necessarily correlate:
> >
> > shard1: 80000000-aaa9ffff
> > shard2: aaaa0000-d554ffff
> > shard3: d5550000-ffffffff ( holds the last range )
> > shard4: 0-2aa9ffff ( holds the starting range )
> > shard5: 2aaa0000-5554ffff
> > shard6: 55550000-7fffffff
>
>
> It's not really clear what you mean by "correlate"... but I think
> there are 2 different points to make:
> 1) This is the hex representation of a signed integer, so 80000000 is
> the start of the complete hash range, and 7fffffff is the end.
> 2) The numbers in shard1, shard2, etc, names are meaningless... just
> names like shard_foo and shard_bar.  They do not need to be ordered in
> any way with respect to each other.
>
> -Yonik
>
> > same goes for 'core_node<x>’ that does not follow order neither it
> > correlates with shard<x>. Meaning core_node<1> does not contain the keys
> > starting from 0 nor does it map to shard<1>.
> >
> > {"shard1"=>
> >   {"range"=>"80000000-aaa9ffff",
> >     {"core_node5"=>
> >       "core"=>"post_NW_201508_shard1_replica1",
> >   "shard2"=>
> >     {"range"=>"aaaa0000-d554ffff",
> >       {"core_node6"=>
> >         "core"=>"post_NW_201508_shard2_replica1",
> >   "shard3"=>
> >     {"range"=>"d5550000-ffffffff",
> >       {"core_node2"=>
> >         "core"=>"post_NW_201508_shard3_replica1",
> >   "shard4"=>
> >     {"range"=>"0-2aa9ffff",
> >       {"core_node3"=>
> >         "core"=>"post_NW_201508_shard4_replica1",
> >   "shard5"=>
> >     {"range"=>"2aaa0000-5554ffff",
> >       {"core_node4"=>
> >         "core"=>"post_NW_201508_shard5_replica1",
> >   "shard6"=>
> >     {"range"=>"55550000-7fffffff",
> >       {"core_node1"=>
> >         "core"=>"post_NW_201508_shard6_replica1"
> >
> >
> > Why would this be a concern ?
> >
> >    1. Lets say if we merge the indexes of adjacent shards (to reduce the
> >    number of shards in the collection). In this case it will be merging
> >    "core_node3: 0-2aa9ffff” & "core_node4: 2aaa0000-5554ffff” . What
> would the
> >    index of the new core_node directory ? core_node<?>
> >    2. When we copy this data over to the cluster after recreating the
> >    collection with reduced number of shards, how would the cluster infer
> the
> >    hash range from the index data or how does it reconcile with the
> metadata
> >    about the shards in the local filesystem of cluster nodes.
> >    3. How should we approach this problem to guarantee Solr picks up the
> >    right key order from the merged indexes ?
> >
> >
> >
> > *Solr 4.4*
> > *HDFS for Index Storage*
>

Reply via email to