Solr 5.0 has support for distributed IDF. Also, users having the same IDF is orthogonal to the original question.
In general, the Doc Freq. is only per-shard. If for some reason, a single user has documents split across shards, the IDF used would be different for docs on different shards. On Wed, Feb 4, 2015 at 9:06 PM, Dan Davis <dansm...@gmail.com> wrote: > Doesn't relevancy for that assume that the IDF and TF for user1 and user2 > are not too different? SolrCloud still doesn't use a distributed IDF, > correct? > > On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum <gilinac...@gmail.com> wrote: > > > Alright. So shard splitting and composite routing plays nicely together. > > Thank you Anshum. > > > > On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta <ans...@anshumgupta.net> > > wrote: > > > > > In one line, shard splitting doesn't cater to depend on the routing > > > mechanism but just the hash range so you could have documents for the > > same > > > prefix split up. > > > > > > Here's an overview of routing in SolrCloud: > > > * Happens based on a hash value > > > * The hash is calculated using the multiple parts of the routing key. > In > > > case of A!B, 16 bits are obtained from murmurhash(A) and the LSB 16 > bits > > of > > > the routing key are obtained from murmurhash(B). This sends the docs to > > the > > > right shard. > > > * When querying using A!, all shards that contain hashes from the range > > 16 > > > bits from murmurhash(A)-0000 to murmurhash(A)-ffff are used. > > > > > > When you split a shard, for say range 00000000 - ffffffff , it is split > > > from the middle (by default) and over multiple split, docs for the same > > A! > > > prefix might end up on different shards, but the request routing should > > > take care of that. > > > > > > You can read more about routing here: > > > https://lucidworks.com/blog/solr-cloud-document-routing/ > > > http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/ > > > > > > and shard splitting here: > > > http://lucidworks.com/blog/shard-splitting-in-solrcloud/ > > > > > > > > > On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum <gilinac...@gmail.com> > > wrote: > > > > > > > Hi, I'm also interested. When using composite the ID, the _route_ > > > > information is not kept on the document itself, so to me it looks > like > > > it's > > > > not possible as the split API > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 > > > > > > > > > doesn't have a relevant parameter to split correctly. > > > > Could report back once I try it in practice. > > > > > > > > On Mon, Nov 10, 2014 at 7:27 PM, Ian Rose <ianr...@fullstory.com> > > wrote: > > > > > > > > > Howdy - > > > > > > > > > > We are using composite IDs of the form <user>!<event>. This > ensures > > > that > > > > > all events for a user are stored in the same shard. > > > > > > > > > > I'm assuming from the description of how composite ID routing > works, > > > that > > > > > if you split a shard the "split point" of the hash range for that > > shard > > > > is > > > > > chosen to maintain the invariant that all documents that share a > > > routing > > > > > prefix (before the "!") will still map to the same (new) shard. Is > > > that > > > > > accurate? > > > > > > > > > > A naive shard-split implementation (e.g. that chose the hash range > > > split > > > > > point arbitrarily) could end up with "child" shards that split a > > > routing > > > > > prefix. > > > > > > > > > > Thanks, > > > > > Ian > > > > > > > > > > > > > > > > > > > > > -- > > > Anshum Gupta > > > http://about.me/anshumgupta > > > > > > -- Anshum Gupta http://about.me/anshumgupta