Mostly I'm just trying to understand. For the moment I'm putting together a design for distributed Lux (XQuery backed by Solr Cloud). My motivation is that I am feeding results into its separate XQuery system, and that requires a consistent global document ordering. The ordering can be arbitrary, it just has to be stable for the duration of a single query (but this could span multiple lucene/solr queries). In the non-distributed version of this, I just use the docid directly, which is convenient. In the distributed case, I'd like to understand how the ordering is defined so that I can compute an integer that is sorted in the same way. For example (shard "id" << 24) | docid or something like that.

I can see that there might be perturbations in the ordering if there are updates (Lucene can reassign docids, etc). With Lucene I'm able to control this by keeping a Searcher/Reader open for the duration of the query. It seems that in Solr (cloud or not), I can't really get this kind of guarantee. I guess I'm willing to live with this since the time window is very small and the likelihood of a problem is small (most XQueries only use a single underlying Solr query anyway, so this whole concern is a little bit pathological). I've been considering using a global ordering based on my unique id (document uri), although of course an update can still happen and mess things up mid-query, so ultimately it's not a bulletproof solution either.

Thanks, Jack

-Mike

On 9/2/2013 8:26 PM, Jack Krupansky wrote:
"*:*" is a constant score query - every document has the same score, so the concept of relevancy has no relevance.

But, in theory, you could apply boost queries and function queries to scale or offset those constant scores. If so, then you should see relevancy sorting, otherwise the concept of relevancy does not apply.

I don't think Solr offers any "contract" as to ordering of constant score documents or merging of same score documents across shards. At least I have never seen such a contract published. So, if you are merely observing the actual behavior of Solr, fine, but if you are expecting that such behavior will persist in future releases, there can be no such guarantee.

I don't think Solr will necessarily guarantee that Lucene doc IDs will be the same between replicas (the order in which distributed updates are received), so there is no guarantee that behavior you see from one round-robin iteration will necessarily be the same on a repeat of the same distributed query.

The bottom line is: What exactly are you after, simply an explanation for what you are seeing, or a guarantee that you will always see that behavior?

-- Jack Krupansky

-----Original Message----- From: Michael Sokolov
Sent: Monday, September 02, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: distributed query result order tie break question

My question is about how query results are ordered in a distributed
query when sorting by "relevance" and all the documents have the same
score, for example, when querying for "*:*".

It looks to me as if score ties are broken by shard and then within each
shard, by docid.  So for example, if I were to iterate over all the
documents using such a query, I would expect to get all the documents
from one shard first, then all the documents from another shard, etc.
Is that right?

Thanks

-Mike Sokolov

Reply via email to