Re: SolrCloud on Trunk

Jamie Johnson Tue, 28 Feb 2012 06:34:11 -0800

Very interesting Andre.  I believe this is inline with the larger
vision, specifically you'd use the hashing algorithm to create the
initial splits in the forwarding table, then if you needed to add a
new shard you'd need to split/merge an existing range.  I think
creating the algorithm is probably the easier part (maybe I'm wrong?),
the harder part to me appears to be splitting the index based on the
new ranges and then moving that split to a new core.  I'm aware of the
index splitter contrib which could be used for this, but I am unaware
of where specifically this is on the roadmap for SolrCloud.  Anyone
else have those details?


On Tue, Feb 28, 2012 at 5:40 AM, Andre Bois-Crettez
<andre.b...@kelkoo.com> wrote:
> Consistent hashing seem like a solution to reduce the shuffling of keys
> when adding/deleting shards :
> http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
>
> Twitter describe a more flexible sharding in section "Gizzard handles
> partitioning through a forwarding table"
> https://github.com/twitter/gizzard
> An explicit mapping could allow to take advantage of heterogeneous
> servers, and still allow for reduced shuffling of document when
> expanding/reducing the cluster.
>
> Are there any ideas or progress in this direction, be it in a branch or
> in JIRA issues ?
>
>
> Andre
>
>
>
> Jamie Johnson wrote:
>>
>> The case is actually anytime you need to add another shard.  With the
>> current implementation if you need to add a new shard the current
>> hashing approach breaks down.  Even with many small shards I think you
>> still have this issue when you're adding/updating/deleting docs.  I'm
>> definitely interested in hearing other approaches that would work
>> though if there are any.
>>
>> On Sat, Jan 28, 2012 at 7:53 PM, Lance Norskog <goks...@gmail.com> wrote:
>>
>>> If this is to do load balancing, the usual solution is to use many
>>> small shards, so you can just move one or two without doing any
>>> surgery on indexes.
>>>
>>> On Sat, Jan 28, 2012 at 2:46 PM, Yonik Seeley
>>> <yo...@lucidimagination.com> wrote:
>>>
>>>> On Sat, Jan 28, 2012 at 3:45 PM, Jamie Johnson <jej2...@gmail.com>
>>>> wrote:
>>>>
>>>>> Second question, I know there are discussion about storing the shard
>>>>> assignments in ZK (i.e. shard 1 is responsible for hashed values
>>>>> between 0 and 10, shard 2 is responsible for hashed values between 11
>>>>> and 20, etc), this isn't done yet right?  So currently the hashing is
>>>>> based on the number of shards instead of having the assignments being
>>>>> calculated the first time you start the cluster (i.e. based on
>>>>> numShards) so it could be adjusted later, right?
>>>>>
>>>> Right.  Storing the hash range for each shard/node is something we'll
>>>> need to dynamically change the number of shards (as opposed to
>>>> replicas), so we'll need to start doing it sooner or later.
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>
>>
>
> --
> André Bois-Crettez
>
> Search technology, Kelkoo
> http://www.kelkoo.com/
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à l'attention
> exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
> message, merci de le détruire et d'en avertir l'expéditeur.

Re: SolrCloud on Trunk

Reply via email to