This is expected for NRT replicas. For NRT, segments are _not_ the unit of update for the replicas, documents are. So the process is:
- leader gets documents to index - leader indexes them locally and forwards the raw documents to the replicas The autocommit timers trigger when the first doc hits a replica. Due to network uncertainties and the like, the autocommit timers do not expire at the same wall-clock time on all replicas. Plus, docs may have been received/indexed by one replica but not another at the instant their autocommit timer expires. So segments on different replicas will contain different docs. But wait! There’s more! Segments are merged, and due to the fact that the segments may have different docs in them, different decisions will be made about which segments to combine. That all means that the segments on different replicas of the same shard will have different docs in them. Additionally, if docs are updated/deleted, since the TF/IDF stats include deleted docs, the scores from different replicas can have different scores. And just for your continued delectation….. even if the scores are identical for two documents on multiple replicas, the final order may be different depending on the replica. This is because the tiebreaker for identical scores is the _internal_ lucene document id, which changes when segments are merged, even possibly the relative order of the same two docs. So, you can try enabling stats cache, see: https://lucene.apache.org/solr/guide/7_7/distributed-requests.html None of the above applies to TLOG/PULL setups, because in those situations segments _are_ the unit of update, they’re copied from the leader as-is. However, there are still situations where the order will be (temporarily) different. To whit: followers periodically poll the leader for changed segments. Again, due to network vagaries a given segment may or may not have been replicate to the follower at any given time T, so if you happen to query replica1 and replica2 when a segment has been copied to one but not the other, the stats used to compute the score may be slightly different. This should only be the case when documents are being ingested, once indexing has stopped and all followers have polled the leader and replicated the segments, things should be identical. Best, Erick > On Jan 14, 2020, at 8:25 AM, Nicolas Franck <[email protected]> wrote: > > I noticed a - in my opinion - strange behavior in Solr Cloud. > > I have a collection that has 1 shard and two replica's. > > When I look at the directory structure, both have the same file names > in "data/index" .. > > BUT the contents of those files are different. > > So when I query this collection, and sort on "score", > and the score is the same for a lot of documents, > then the order is different depending on the node that > was queried. The results are the same, just the returned order. > > I guess the segments are not sent as "is" from leaders to the other replica's? > Or something else could be wrong? > > Thanks in advance > >
