Great thanks Yonik. On Mon, Aug 17, 2015 at 5:16 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> On Mon, Aug 17, 2015 at 8:00 PM, Sathiya N Sundararajan > <ausat...@gmail.com> wrote: > > Folks: > > > > Question regarding SolrCloud Shard Number (Ex: shard<x>) & associated > hash > > ranges. We are in the process of identifying the best strategy to merge > > shards that belong to collections that are chronologically older which > sees > > very low volume of searches compared to the collections with most recent > > data. > > > > What we ran into is that often times we find that Shard numbers are hash > > ranges don’t necessarily correlate: > > > > shard1: 80000000-aaa9ffff > > shard2: aaaa0000-d554ffff > > shard3: d5550000-ffffffff ( holds the last range ) > > shard4: 0-2aa9ffff ( holds the starting range ) > > shard5: 2aaa0000-5554ffff > > shard6: 55550000-7fffffff > > > It's not really clear what you mean by "correlate"... but I think > there are 2 different points to make: > 1) This is the hex representation of a signed integer, so 80000000 is > the start of the complete hash range, and 7fffffff is the end. > 2) The numbers in shard1, shard2, etc, names are meaningless... just > names like shard_foo and shard_bar. They do not need to be ordered in > any way with respect to each other. > > -Yonik > > > same goes for 'core_node<x>’ that does not follow order neither it > > correlates with shard<x>. Meaning core_node<1> does not contain the keys > > starting from 0 nor does it map to shard<1>. > > > > {"shard1"=> > > {"range"=>"80000000-aaa9ffff", > > {"core_node5"=> > > "core"=>"post_NW_201508_shard1_replica1", > > "shard2"=> > > {"range"=>"aaaa0000-d554ffff", > > {"core_node6"=> > > "core"=>"post_NW_201508_shard2_replica1", > > "shard3"=> > > {"range"=>"d5550000-ffffffff", > > {"core_node2"=> > > "core"=>"post_NW_201508_shard3_replica1", > > "shard4"=> > > {"range"=>"0-2aa9ffff", > > {"core_node3"=> > > "core"=>"post_NW_201508_shard4_replica1", > > "shard5"=> > > {"range"=>"2aaa0000-5554ffff", > > {"core_node4"=> > > "core"=>"post_NW_201508_shard5_replica1", > > "shard6"=> > > {"range"=>"55550000-7fffffff", > > {"core_node1"=> > > "core"=>"post_NW_201508_shard6_replica1" > > > > > > Why would this be a concern ? > > > > 1. Lets say if we merge the indexes of adjacent shards (to reduce the > > number of shards in the collection). In this case it will be merging > > "core_node3: 0-2aa9ffff” & "core_node4: 2aaa0000-5554ffff” . What > would the > > index of the new core_node directory ? core_node<?> > > 2. When we copy this data over to the cluster after recreating the > > collection with reduced number of shards, how would the cluster infer > the > > hash range from the index data or how does it reconcile with the > metadata > > about the shards in the local filesystem of cluster nodes. > > 3. How should we approach this problem to guarantee Solr picks up the > > right key order from the merged indexes ? > > > > > > > > *Solr 4.4* > > *HDFS for Index Storage* >