Mostly I'm just trying to understand. For the moment I'm putting
together a design for distributed Lux (XQuery backed by Solr Cloud). My
motivation is that I am feeding results into its separate XQuery system,
and that requires a consistent global document ordering. The ordering
can be arbitrary, it just has to be stable for the duration of a single
query (but this could span multiple lucene/solr queries). In the
non-distributed version of this, I just use the docid directly, which is
convenient. In the distributed case, I'd like to understand how the
ordering is defined so that I can compute an integer that is sorted in
the same way. For example (shard "id" << 24) | docid or something like
that.
I can see that there might be perturbations in the ordering if there are
updates (Lucene can reassign docids, etc). With Lucene I'm able to
control this by keeping a Searcher/Reader open for the duration of the
query. It seems that in Solr (cloud or not), I can't really get this
kind of guarantee. I guess I'm willing to live with this since the time
window is very small and the likelihood of a problem is small (most
XQueries only use a single underlying Solr query anyway, so this whole
concern is a little bit pathological). I've been considering using a
global ordering based on my unique id (document uri), although of course
an update can still happen and mess things up mid-query, so ultimately
it's not a bulletproof solution either.
Thanks, Jack
-Mike
On 9/2/2013 8:26 PM, Jack Krupansky wrote:
"*:*" is a constant score query - every document has the same score,
so the concept of relevancy has no relevance.
But, in theory, you could apply boost queries and function queries to
scale or offset those constant scores. If so, then you should see
relevancy sorting, otherwise the concept of relevancy does not apply.
I don't think Solr offers any "contract" as to ordering of constant
score documents or merging of same score documents across shards. At
least I have never seen such a contract published. So, if you are
merely observing the actual behavior of Solr, fine, but if you are
expecting that such behavior will persist in future releases, there
can be no such guarantee.
I don't think Solr will necessarily guarantee that Lucene doc IDs will
be the same between replicas (the order in which distributed updates
are received), so there is no guarantee that behavior you see from one
round-robin iteration will necessarily be the same on a repeat of the
same distributed query.
The bottom line is: What exactly are you after, simply an explanation
for what you are seeing, or a guarantee that you will always see that
behavior?
-- Jack Krupansky
-----Original Message----- From: Michael Sokolov
Sent: Monday, September 02, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: distributed query result order tie break question
My question is about how query results are ordered in a distributed
query when sorting by "relevance" and all the documents have the same
score, for example, when querying for "*:*".
It looks to me as if score ties are broken by shard and then within each
shard, by docid. So for example, if I were to iterate over all the
documents using such a query, I would expect to get all the documents
from one shard first, then all the documents from another shard, etc.
Is that right?
Thanks
-Mike Sokolov