Re: distributed query result order tie break question

Michael Sokolov Mon, 02 Sep 2013 17:44:56 -0700

Mostly I'm just trying to understand. For the moment I'm puttingtogether a design for distributed Lux (XQuery backed by Solr Cloud). Mymotivation is that I am feeding results into its separate XQuery system,and that requires a consistent global document ordering. The orderingcan be arbitrary, it just has to be stable for the duration of a singlequery (but this could span multiple lucene/solr queries). In thenon-distributed version of this, I just use the docid directly, which isconvenient. In the distributed case, I'd like to understand how theordering is defined so that I can compute an integer that is sorted inthe same way. For example (shard "id" << 24) | docid or something likethat.

I can see that there might be perturbations in the ordering if there areupdates (Lucene can reassign docids, etc). With Lucene I'm able tocontrol this by keeping a Searcher/Reader open for the duration of thequery. It seems that in Solr (cloud or not), I can't really get thiskind of guarantee. I guess I'm willing to live with this since the timewindow is very small and the likelihood of a problem is small (mostXQueries only use a single underlying Solr query anyway, so this wholeconcern is a little bit pathological). I've been considering using aglobal ordering based on my unique id (document uri), although of coursean update can still happen and mess things up mid-query, so ultimatelyit's not a bulletproof solution either.


Thanks, Jack

-Mike

On 9/2/2013 8:26 PM, Jack Krupansky wrote:

"*:*" is a constant score query - every document has the same score,so the concept of relevancy has no relevance.
But, in theory, you could apply boost queries and function queries toscale or offset those constant scores. If so, then you should seerelevancy sorting, otherwise the concept of relevancy does not apply.
I don't think Solr offers any "contract" as to ordering of constantscore documents or merging of same score documents across shards. Atleast I have never seen such a contract published. So, if you aremerely observing the actual behavior of Solr, fine, but if you areexpecting that such behavior will persist in future releases, therecan be no such guarantee.
I don't think Solr will necessarily guarantee that Lucene doc IDs willbe the same between replicas (the order in which distributed updatesare received), so there is no guarantee that behavior you see from oneround-robin iteration will necessarily be the same on a repeat of thesame distributed query.
The bottom line is: What exactly are you after, simply an explanationfor what you are seeing, or a guarantee that you will always see thatbehavior?
-- Jack Krupansky

-----Original Message----- From: Michael Sokolov
Sent: Monday, September 02, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: distributed query result order tie break question

My question is about how query results are ordered in a distributed
query when sorting by "relevance" and all the documents have the same
score, for example, when querying for "*:*".

It looks to me as if score ties are broken by shard and then within each
shard, by docid.  So for example, if I were to iterate over all the
documents using such a query, I would expect to get all the documents
from one shard first, then all the documents from another shard, etc.
Is that right?

Thanks

-Mike Sokolov

Re: distributed query result order tie break question

Reply via email to