Thanks, Anshum - I should never have posted so late.    It is true that
different users will have different word frequencies, but an application
exploiting that for better relevancy would be going far for the relevancy
of individual user's results.

On Thu, Feb 5, 2015 at 12:41 AM, Anshum Gupta <ans...@anshumgupta.net>
wrote:

> Solr 5.0 has support for distributed IDF. Also, users having the same IDF
> is orthogonal to the original question.
>
> In general, the Doc Freq. is only per-shard. If for some reason, a single
> user has documents split across shards, the IDF used would be different for
> docs on different shards.
>
> On Wed, Feb 4, 2015 at 9:06 PM, Dan Davis <dansm...@gmail.com> wrote:
>
>> Doesn't relevancy for that assume that the IDF and TF for user1 and user2
>> are not too different?    SolrCloud still doesn't use a distributed IDF,
>> correct?
>>
>> On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum <gilinac...@gmail.com> wrote:
>>
>> > Alright. So shard splitting and composite routing plays nicely together.
>> > Thank you Anshum.
>> >
>> > On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta <ans...@anshumgupta.net>
>> > wrote:
>> >
>> > > In one line, shard splitting doesn't cater to depend on the routing
>> > > mechanism but just the hash range so you could have documents for the
>> > same
>> > > prefix split up.
>> > >
>> > > Here's an overview of routing in SolrCloud:
>> > > * Happens based on a hash value
>> > > * The hash is calculated using the multiple parts of the routing key.
>> In
>> > > case of A!B, 16 bits are obtained from murmurhash(A) and the LSB 16
>> bits
>> > of
>> > > the routing key are obtained from murmurhash(B). This sends the docs
>> to
>> > the
>> > > right shard.
>> > > * When querying using A!, all shards that contain hashes from the
>> range
>> > 16
>> > > bits from murmurhash(A)-0000 to murmurhash(A)-ffff are used.
>> > >
>> > > When you split a shard, for say range 00000000 - ffffffff , it is
>> split
>> > > from the middle (by default) and over multiple split, docs for the
>> same
>> > A!
>> > > prefix might end up on different shards, but the request routing
>> should
>> > > take care of that.
>> > >
>> > > You can read more about routing here:
>> > > https://lucidworks.com/blog/solr-cloud-document-routing/
>> > >
>> http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/
>> > >
>> > > and shard splitting here:
>> > > http://lucidworks.com/blog/shard-splitting-in-solrcloud/
>> > >
>> > >
>> > > On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum <gilinac...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi, I'm also interested. When using composite the ID, the _route_
>> > > > information is not kept on the document itself, so to me it looks
>> like
>> > > it's
>> > > > not possible as the split API
>> > > > <
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
>> > > > >
>> > > > doesn't have a relevant parameter to split correctly.
>> > > > Could report back once I try it in practice.
>> > > >
>> > > > On Mon, Nov 10, 2014 at 7:27 PM, Ian Rose <ianr...@fullstory.com>
>> > wrote:
>> > > >
>> > > > > Howdy -
>> > > > >
>> > > > > We are using composite IDs of the form <user>!<event>.  This
>> ensures
>> > > that
>> > > > > all events for a user are stored in the same shard.
>> > > > >
>> > > > > I'm assuming from the description of how composite ID routing
>> works,
>> > > that
>> > > > > if you split a shard the "split point" of the hash range for that
>> > shard
>> > > > is
>> > > > > chosen to maintain the invariant that all documents that share a
>> > > routing
>> > > > > prefix (before the "!") will still map to the same (new) shard.
>> Is
>> > > that
>> > > > > accurate?
>> > > > >
>> > > > > A naive shard-split implementation (e.g. that chose the hash range
>> > > split
>> > > > > point arbitrarily) could end up with "child" shards that split a
>> > > routing
>> > > > > prefix.
>> > > > >
>> > > > > Thanks,
>> > > > > Ian
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Anshum Gupta
>> > > http://about.me/anshumgupta
>> > >
>> >
>>
>
>
>
> --
> Anshum Gupta
> http://about.me/anshumgupta
>

Reply via email to