Good morning, so based on your answer, there is no garantee that the results
will be the same from one replica to the other.
I ran the queries in debug mode and I see...
MASTER
"321240": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20206)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 20206, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=20206)\n",
"432633": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20457)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 20457, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=20457)\n",
"321166": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23414)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 23414, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=23414)\n",
"362806": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25531)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 25531, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=25531)\n",
"684662": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 27656)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 27656, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=27656)\n",
"425926": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 28662)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 28662, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=28662)\n",
"718098": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 44509)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 44509, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=44509)\n",
"527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 53653)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 53653, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=53653)\n",
"138537": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 56137)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 56137, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=56137)\n",
"633800": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 67368)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 67368, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=67368)\n"
REPLICA
"111294": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 4803)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 4803, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=4803)\n",
"164137": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 4878)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 4878, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=4878)\n",
"553503": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 6907)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 6907, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=6907)\n",
"684621": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 12453)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 12453, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=12453)\n",
"674028": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15029)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 15029, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=15029)\n",
"563023": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15698)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 15698, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=15698)\n",
"894824": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 19256)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 19256, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=19256)\n",
"540476": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20843)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 20843, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=20843)\n",
"671271": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23778)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 23778, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 =
fieldNorm(doc=23778)\n",
"527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25053)
[DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 25053, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=25053)\n"
Just to make sure I interpret the results correctly:
- they all have a score of 1.7046129
- the order they are presented in is therefore not related to the score, it is
just the order in which the data is internally stored (like an SQL SELECT
statement without ORDER BY clause)
Follow up question:
- If I want to force a sort operation, I should add a sort parameter in the
query. The first sort will be done by score and then documents with the same
score will be sorted by my sort=?? paremeter?
- or will my sort parameter overwrite the score sorting?
Thank you again for your help,
Nic.
--------------------------------------------
On Mon, 2/3/14, Erick Erickson <[email protected]> wrote:
Subject: Re: SolrCloud query results order master vs replica
To: [email protected]
Received: Monday, February 3, 2014, 2:19 PM
This should only be
happening if the scores are _exactly_ the same,
which is actually
quite rare.
In that case, the tied scores are broken by the internal
Lucene document
ID, and the
relative order of the docs on the two machines isn't
guaranteed to be the
same, the
internal ID can change during segment merging, which is NOT
the same
on both machines.
But this should be relatively
rare. If you're doing *:* queries or
other such, then they
aren't scored (see ConstantScoreQuery). So
in practical terms, I suspect you're
seeing some kind of test artifact. Try adding
&debug=all to the query
and you'll
see
how documents are scored.
Best,
Erick
On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie
<[email protected]>
wrote:
> Greetings,
>
> My setup is:
> - SolrCloud V4.3
> - On
collection
> - one shard
> - 1 master, 1 replica
>
> so each instance
contains the entire index. The index is rather small and
the replica is used for robustness. There is no need
(IMHO) to split shard the index (yet, until the index gets
bigger).
>
> My
question:
> - if I do a query on a
product name (that is what the index is about) on the master
I get a certain number of results and the documents.
> - if I do the same query on the replica, I
get the same number of results but the docs are in a
different order.
> - I do not specify a
sort parameter in my query, simply a q=<product
name>.
> - obviously if I force a sort
order, everything is ok, same results, same order from both
instances.
> - am I wrong in expecting
the same results, in the SAME order?
>
> Follow up question if the order is not
guaranteed:
> - should I force the dev.
to use an explicit sort order?
> - if we
force the sort, we then bypass the ranking / score order do
we not?
> - should I force all queries to
go to the master and fall back on the replica only in the
context of a total loss of the master?
>
> Other useful
information:
> - the admin
page shows same number of documents in both instances.
> - logs are clean, load and
replication and queries worked ok.
> - the web application that
queries SOLR round robins between the two instances, so
getting results in a different order is bad for
consistency.
>
> Thank
you for your help!
>
>
Nic
>