The problem is that the cursor mark query returns different numbers of
documents each time it is called when the collection has multiple replicas
per shard.

I meant collection. The same collection is on different clouds. The
collection in one cloud 1 has 2 shards with 1 replica per shard. In the
second cloud the collection has 2 shards with 2 replicas per shard.

The same query using cursorMark against the second cloud returns different
numbers of documents. It appears that each replica returns a slightly
different number of documents. when run against cloud #1 it always returns
the same documents.
Here is a little bit from my debug statements.
count is the number found, solr_retrieved is a counter for all the
documents actually returned over all the calls to the cursor mark Why are
they different?
Each of these represent a search against our collection.

    "count": 1382,
    "solr_returned": 1281,

    "count": 1382,
    "solr_returned": 1366,

    "count": 1382,
    "solr_returned": 1225,

    "count": 1382,
    "solr_returned": 1397,


Taking score out of the sort, cloud #2 will return consistent result sets.



On Mon, Jan 15, 2018 at 1:28 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/15/2018 11:56 AM, Webster Homer wrote:
>
>> I have noticed strange behavior using cursorMark for deep paging in an
>> application. We use solrcloud for searching. We have several clouds for
>> development. For our development systems we have two different clouds. One
>> cloud has 2 shards with 1 replica per shard. All or our other clouds are
>> set up with 2 shards and 2 replicas per shard.
>>
>
> A cloud doesn't get set up with shards and replicas.  A collection does.
> One SolrCloud cluster can contain many collections.
>
> When you say "cloud" are you referring to a collection, or are you
> referring to a set of servers running ZooKeeper and Solr? The latter is
> what I would expect cloud to mean.
>
> When I run against the first cloud, I always get consistent results for the
>> same query. That is not the case with the second cloud. Some queries
>> return
>> different numbers of results each time it's called. In the code I return
>> the number found from solr, and I count the number of results for all
>> iterations against the cursor mark. Sometimes it returns more rows than
>> the
>> numFound and sometimes less.
>>
>> I figured that the problem was in my code or in the data to make it easier
>> to find the problem I changed the sort to just be the unique id from the
>> schema. The problem went away.
>>
>> 1. The Number Found from solr was always the same
>> 2. It worked when there was only 1 replica per shard
>> 3. From debug statements it appears to return different total counts from
>> different replicas. When there were 2 replicas per shard I saw 4 different
>> values being returned.
>> 4. Not sorting on score, and only on the unique id provides consistent
>> results.
>>
>
> When you have multiple replicas, each replica may have different numbers
> of deleted documents.  Deleted documents will almost always affect
> scoring.  Because SolrCloud load balances across replicas, one page of your
> cursorMark query can be served by a different replica than the next one, so
> the order of results can differ.
>
> When sorting by unique ID, deleted documents will not affect sort order.
> When there is only one replica, then sorting by score will always produce
> the same order, unless the index gets modified.
>
> Thanks,
> Shawn
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Reply via email to