Doesn't relevancy for that assume that the IDF and TF for user1 and user2 are not too different? SolrCloud still doesn't use a distributed IDF, correct?
On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum <gilinac...@gmail.com> wrote: > Alright. So shard splitting and composite routing plays nicely together. > Thank you Anshum. > > On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta <ans...@anshumgupta.net> > wrote: > > > In one line, shard splitting doesn't cater to depend on the routing > > mechanism but just the hash range so you could have documents for the > same > > prefix split up. > > > > Here's an overview of routing in SolrCloud: > > * Happens based on a hash value > > * The hash is calculated using the multiple parts of the routing key. In > > case of A!B, 16 bits are obtained from murmurhash(A) and the LSB 16 bits > of > > the routing key are obtained from murmurhash(B). This sends the docs to > the > > right shard. > > * When querying using A!, all shards that contain hashes from the range > 16 > > bits from murmurhash(A)-0000 to murmurhash(A)-ffff are used. > > > > When you split a shard, for say range 00000000 - ffffffff , it is split > > from the middle (by default) and over multiple split, docs for the same > A! > > prefix might end up on different shards, but the request routing should > > take care of that. > > > > You can read more about routing here: > > https://lucidworks.com/blog/solr-cloud-document-routing/ > > http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/ > > > > and shard splitting here: > > http://lucidworks.com/blog/shard-splitting-in-solrcloud/ > > > > > > On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum <gilinac...@gmail.com> > wrote: > > > > > Hi, I'm also interested. When using composite the ID, the _route_ > > > information is not kept on the document itself, so to me it looks like > > it's > > > not possible as the split API > > > < > > > > > > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 > > > > > > > doesn't have a relevant parameter to split correctly. > > > Could report back once I try it in practice. > > > > > > On Mon, Nov 10, 2014 at 7:27 PM, Ian Rose <ianr...@fullstory.com> > wrote: > > > > > > > Howdy - > > > > > > > > We are using composite IDs of the form <user>!<event>. This ensures > > that > > > > all events for a user are stored in the same shard. > > > > > > > > I'm assuming from the description of how composite ID routing works, > > that > > > > if you split a shard the "split point" of the hash range for that > shard > > > is > > > > chosen to maintain the invariant that all documents that share a > > routing > > > > prefix (before the "!") will still map to the same (new) shard. Is > > that > > > > accurate? > > > > > > > > A naive shard-split implementation (e.g. that chose the hash range > > split > > > > point arbitrarily) could end up with "child" shards that split a > > routing > > > > prefix. > > > > > > > > Thanks, > > > > Ian > > > > > > > > > > > > > > > -- > > Anshum Gupta > > http://about.me/anshumgupta > > >