First, if Upayavira's intuition is correct (and I'm guessing it is), then the behavior you're seeing is probably an accident of coding rather than intentional. I think the algorithm is something like this:
Node1 gets the original query Node1 sends sub-queries out to each shard. As the results come back, they're sorted one by one into a final list. For simplicity, let's claim _all_ the docs have the exact same score. The _first_ shard's response will completely fill up the final list. The rest will be thrown on the floor as none of the docs from the other 6 shards will have a higher score than any doc currently in the list. Here's the important part. The order that the sub-requests come back varies due to a zillion possible causes, network latency, a minor GC pause on one of the shards, whether all the caches are loaded, whatever. So subsequent calls will happen to get some _other_ shards docs in the list first. Does that make sense? On Thu, Sep 10, 2015 at 4:48 AM, Modassar Ather <modather1...@gmail.com> wrote: > If two documents come back from different > shards with the same score, the order would not be predictable > > This is fine. > > What I am not able to understand is that when I do not give a secondary > field for sort I am getting the result from one shard which changes to > other shard in other hits. Here the results are always from one shard. > E.g In first hit all the results are from shard1 and in next hit all the > results are from shard2. > > But when I add the secondary sort field I see the results from multiple > shards. E.g It has results from shard1 and shard2 both. This does not > change in multiple hits. > > So please help me understand why the similar result merge and aggregation > in not happening in when a single sort field is given? > > Regards, > Modassar > > > > On Thu, Sep 10, 2015 at 5:03 PM, Upayavira <u...@odoko.co.uk> wrote: > >> What scores are you getting? If two documents come back from different >> shards with the same score, the order would not be predictable - >> probably down to which shard responds first. >> >> Fix it with something like sort=score,timestamp or some other time >> related field. >> >> Upayavira >> >> On Thu, Sep 10, 2015, at 11:01 AM, Modassar Ather wrote: >> > To add to my previous observation I saw the response having results from >> > multiple shards when the secondary sort field is added and they remain >> > same >> > across hits. >> > Kindly help me understand this behavior. Why the results are changing as >> > I >> > understand that the result should be first clubbed together from all >> > shard >> > and then based on their score it should be sorted. >> > But here I see that every time I hit the sort query I am getting results >> > from different shard which has different scores. >> > >> > Thanks, >> > Modassar >> > >> > On Thu, Sep 10, 2015 at 2:59 PM, Modassar Ather <modather1...@gmail.com> >> > wrote: >> > >> > > Upayavira! I add the fl=id,score,[shard] and saw the shards changing in >> > > the response every time and for different shards the response changes >> but >> > > for the same shard result is same on multiple hits. >> > > When I add secondary sort field e.g. score the shard remains same >> across >> > > hits. >> > > >> > > On Thu, Sep 10, 2015 at 12:52 PM, Upayavira <u...@odoko.co.uk> wrote: >> > > >> > >> Add fl=id,score,[shard] to your query, and show us the results of two >> > >> differing executions. >> > >> >> > >> Perhaps we will be able to see the cause of the difference. >> > >> >> > >> Upayavira >> > >> >> > >> On Thu, Sep 10, 2015, at 05:35 AM, Modassar Ather wrote: >> > >> > Thanks Erick. There are no replicas on my cluster and the indexing >> is >> > >> one >> > >> > time. No updates or additions are done to the index and the >> segments are >> > >> > optimized at the end of indexing. >> > >> > So adding a secondary sort criteria is the only solution for such >> issue >> > >> > in >> > >> > sort? >> > >> > >> > >> > Regards, >> > >> > Modassar >> > >> > >> > >> > On Wed, Sep 9, 2015 at 8:21 PM, Erick Erickson < >> erickerick...@gmail.com >> > >> > >> > >> > wrote: >> > >> > >> > >> > > When the primary sort criteria is identical for two documents, >> > >> > > then the _internal_ Lucene document ID is used to break the >> > >> > > tie. The internal ID for two docs can be not only different, but >> > >> > > in different _order_ on two separate shards. I'm assuming here >> > >> > > that each of your shards has multiple replicas and/or you're >> > >> > > continuing to index to your cluster. >> > >> > > >> > >> > > The relative internal doc IDs for may change even relative to >> > >> > > each other when segments get merged. >> > >> > > >> > >> > > So yes, if you are sorting by something that can be identical >> > >> > > in documents, it's always best to specify a secondary sort >> > >> > > criteria. It's not referenced unless there's a tie so it's >> > >> > > not that expensive. People often use whatever field >> > >> > > is defined for <uniqueKey> since that's _guaranteed_ to >> > >> > > never be the same for two docs. >> > >> > > >> > >> > > Best, >> > >> > > Erick >> > >> > > >> > >> > > On Wed, Sep 9, 2015 at 1:45 AM, Modassar Ather < >> > >> modather1...@gmail.com> >> > >> > > wrote: >> > >> > > > Hi, >> > >> > > > >> > >> > > > Search results are changed every time the following query is >> hit. >> > >> Please >> > >> > > > note that it is 7 shard cluster of Solr-5.2.1. >> > >> > > > >> > >> > > > Query: q=network&start=50&rows=50&sort=f_sort >> > >> > > asc&group=true&group.field=id >> > >> > > > >> > >> > > > Following are the fields and their types in my schema.xml. >> > >> > > > >> > >> > > > <fieldType name="string" class="solr.StrField" >> > >> sortMissingLast="true" >> > >> > > > stored="false" omitNorms="true"/> >> > >> > > > <fieldType name="string_dv" class="solr.StrField" >> > >> sortMissingLast="true" >> > >> > > > stored="false" indexed="true" docValues="true"/> >> > >> > > > >> > >> > > > <field name="id" type="string" stored="true"/> >> > >> > > > <dynamicField name="*_sort" type="string_dv"/> >> > >> > > > >> > >> > > > As per my understanding it seems to be the issue of tie among >> the >> > >> > > document >> > >> > > > as when I added a new sort field like below the result never >> changed >> > >> > > across >> > >> > > > multiple hits. >> > >> > > > q=network&start=50&rows=50&sort=f_sort asc, score >> > >> > > > asc&group=true&group.field=id >> > >> > > > >> > >> > > > Kindly let me know if this is an issue or how this can be fixed. >> > >> > > > >> > >> > > > Thanks, >> > >> > > > Modassar >> > >> > > >> > >> >> > > >> > > >>