In one line, shard splitting doesn't cater to depend on the routing
mechanism but just the hash range so you could have documents for the same
prefix split up.

Here's an overview of routing in SolrCloud:
* Happens based on a hash value
* The hash is calculated using the multiple parts of the routing key. In
case of A!B, 16 bits are obtained from murmurhash(A) and the LSB 16 bits of
the routing key are obtained from murmurhash(B). This sends the docs to the
right shard.
* When querying using A!, all shards that contain hashes from the range 16
bits from murmurhash(A)-0000 to murmurhash(A)-ffff are used.

When you split a shard, for say range 00000000 - ffffffff , it is split
from the middle (by default) and over multiple split, docs for the same A!
prefix might end up on different shards, but the request routing should
take care of that.

You can read more about routing here:
https://lucidworks.com/blog/solr-cloud-document-routing/
http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/

and shard splitting here:
http://lucidworks.com/blog/shard-splitting-in-solrcloud/


On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum <gilinac...@gmail.com> wrote:

> Hi, I'm also interested. When using composite the ID, the _route_
> information is not kept on the document itself, so to me it looks like it's
> not possible as the split API
> <
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> >
> doesn't have a relevant parameter to split correctly.
> Could report back once I try it in practice.
>
> On Mon, Nov 10, 2014 at 7:27 PM, Ian Rose <ianr...@fullstory.com> wrote:
>
> > Howdy -
> >
> > We are using composite IDs of the form <user>!<event>.  This ensures that
> > all events for a user are stored in the same shard.
> >
> > I'm assuming from the description of how composite ID routing works, that
> > if you split a shard the "split point" of the hash range for that shard
> is
> > chosen to maintain the invariant that all documents that share a routing
> > prefix (before the "!") will still map to the same (new) shard.  Is that
> > accurate?
> >
> > A naive shard-split implementation (e.g. that chose the hash range split
> > point arbitrarily) could end up with "child" shards that split a routing
> > prefix.
> >
> > Thanks,
> > Ian
> >
>



-- 
Anshum Gupta
http://about.me/anshumgupta

Reply via email to