First, if Upayavira's intuition is correct (and I'm guessing it is),
then the behavior you're seeing is probably an accident of
coding rather than intentional. I think the algorithm is something
like this:

Node1 gets the original query
Node1 sends sub-queries out to each shard.
As the results come back, they're sorted one by one into a final
list.

For simplicity, let's claim _all_ the docs have the exact same score.
The _first_
shard's response will completely fill up the final list. The rest will
be thrown on
the floor as none of the docs from the other 6 shards will have a
higher score than
any doc currently in the list.

Here's the important part. The order that the sub-requests come back varies
due to a zillion possible causes, network latency, a minor GC pause on one
of the shards, whether all the caches are loaded, whatever. So subsequent
calls will happen to get some _other_ shards docs in the list first.

Does that make sense?

On Thu, Sep 10, 2015 at 4:48 AM, Modassar Ather <modather1...@gmail.com> wrote:
> If two documents come back from different
> shards with the same score, the order would not be predictable
>
> This is fine.
>
> What I am not able to understand is that when I do not give a secondary
> field for sort I am getting the result from one shard which changes to
> other shard in other hits. Here the results are always from one shard.
> E.g In first hit all the results are from shard1 and in next hit all the
> results are from shard2.
>
> But when I add the secondary sort field I see the results from multiple
> shards. E.g It has results from shard1 and shard2 both. This does not
> change in multiple hits.
>
> So please help me understand why the similar result merge and aggregation
> in not happening in when a single sort field is given?
>
> Regards,
> Modassar
>
>
>
> On Thu, Sep 10, 2015 at 5:03 PM, Upayavira <u...@odoko.co.uk> wrote:
>
>> What scores are you getting? If two documents come back from different
>> shards with the same score, the order would not be predictable -
>> probably down to which shard responds first.
>>
>> Fix it with something like sort=score,timestamp or some other time
>> related field.
>>
>> Upayavira
>>
>> On Thu, Sep 10, 2015, at 11:01 AM, Modassar Ather wrote:
>> > To add to my previous observation I saw the response having results from
>> > multiple shards when the secondary sort field is added and they remain
>> > same
>> > across hits.
>> > Kindly help me understand this behavior. Why the results are changing as
>> > I
>> > understand that the result should be first clubbed together from all
>> > shard
>> > and then based on their score it should be sorted.
>> > But here I see that every time I hit the sort query I am getting results
>> > from different shard which has different scores.
>> >
>> > Thanks,
>> > Modassar
>> >
>> > On Thu, Sep 10, 2015 at 2:59 PM, Modassar Ather <modather1...@gmail.com>
>> > wrote:
>> >
>> > > Upayavira! I add the fl=id,score,[shard] and saw the shards changing in
>> > > the response every time and for different shards the response changes
>> but
>> > > for the same shard result is same on multiple hits.
>> > > When I add secondary sort field e.g. score the shard remains same
>> across
>> > > hits.
>> > >
>> > > On Thu, Sep 10, 2015 at 12:52 PM, Upayavira <u...@odoko.co.uk> wrote:
>> > >
>> > >> Add fl=id,score,[shard] to your query, and show us the results of two
>> > >> differing executions.
>> > >>
>> > >> Perhaps we will be able to see the cause of the difference.
>> > >>
>> > >> Upayavira
>> > >>
>> > >> On Thu, Sep 10, 2015, at 05:35 AM, Modassar Ather wrote:
>> > >> > Thanks Erick. There are no replicas on my cluster and the indexing
>> is
>> > >> one
>> > >> > time. No updates or additions are done to the index and the
>> segments are
>> > >> > optimized at the end of indexing.
>> > >> > So adding a secondary sort criteria is the only solution for such
>> issue
>> > >> > in
>> > >> > sort?
>> > >> >
>> > >> > Regards,
>> > >> > Modassar
>> > >> >
>> > >> > On Wed, Sep 9, 2015 at 8:21 PM, Erick Erickson <
>> erickerick...@gmail.com
>> > >> >
>> > >> > wrote:
>> > >> >
>> > >> > > When the primary sort criteria is identical for two documents,
>> > >> > > then the _internal_ Lucene document ID is used to break the
>> > >> > > tie. The internal ID for two docs can be not only different, but
>> > >> > > in different _order_ on two separate shards. I'm assuming here
>> > >> > > that  each of your shards has multiple replicas and/or you're
>> > >> > > continuing to index to your cluster.
>> > >> > >
>> > >> > > The relative internal doc IDs for may change even relative to
>> > >> > > each other when segments get merged.
>> > >> > >
>> > >> > > So yes, if you are sorting by something that can be identical
>> > >> > > in documents, it's always best to specify a secondary sort
>> > >> > > criteria. It's not referenced unless there's a tie so it's
>> > >> > > not that expensive. People often use whatever field
>> > >> > > is defined for <uniqueKey> since that's _guaranteed_ to
>> > >> > > never be the same for two docs.
>> > >> > >
>> > >> > > Best,
>> > >> > > Erick
>> > >> > >
>> > >> > > On Wed, Sep 9, 2015 at 1:45 AM, Modassar Ather <
>> > >> modather1...@gmail.com>
>> > >> > > wrote:
>> > >> > > > Hi,
>> > >> > > >
>> > >> > > > Search results are changed every time the following query is
>> hit.
>> > >> Please
>> > >> > > > note that it is 7 shard cluster of Solr-5.2.1.
>> > >> > > >
>> > >> > > > Query: q=network&start=50&rows=50&sort=f_sort
>> > >> > > asc&group=true&group.field=id
>> > >> > > >
>> > >> > > > Following are the fields and their types in my schema.xml.
>> > >> > > >
>> > >> > > > <fieldType name="string" class="solr.StrField"
>> > >> sortMissingLast="true"
>> > >> > > > stored="false" omitNorms="true"/>
>> > >> > > > <fieldType name="string_dv" class="solr.StrField"
>> > >> sortMissingLast="true"
>> > >> > > > stored="false" indexed="true" docValues="true"/>
>> > >> > > >
>> > >> > > > <field name="id" type="string" stored="true"/>
>> > >> > > > <dynamicField name="*_sort" type="string_dv"/>
>> > >> > > >
>> > >> > > > As per my understanding it seems to be the issue of tie among
>> the
>> > >> > > document
>> > >> > > > as when I added a new sort field like below the result never
>> changed
>> > >> > > across
>> > >> > > > multiple hits.
>> > >> > > > q=network&start=50&rows=50&sort=f_sort asc, score
>> > >> > > > asc&group=true&group.field=id
>> > >> > > >
>> > >> > > > Kindly let me know if this is an issue or how this can be fixed.
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Modassar
>> > >> > >
>> > >>
>> > >
>> > >
>>

Reply via email to