On 5/25/2018 7:28 AM, SOLR4189 wrote: > I use SOLR-6.5.1 and I want to start to use replicas. > > For it I want to understand something: > > 1) Can asynchronous forwarding document from leader to all replicas or some > another reasons cause that replica A may see update X then Y, and replica B > may see update Y then X? > If yes, thus a particular document in replicaA might sort differently > relative to a document from replicaB if they have the same score (in the > same order as they were stored in the index). Is it an edge case?
I can't speak about whether it's possible to have updates re-ordered. It probably is possible. But whether it's possible or not, there's absolutely no guarantee that Lucene document ordering will be identical between NRT replicas. NRT is the only replica type that Solr 6.x has, and is the default type on Solr 7.x. One replica can have different numbers of deleted documents than another replica, and may not merge segments in exactly the same way as another replica. Because deleted documents can affect score calculation, and one replica may have different deleted documents than another replica, the default sort order (relevancy ranking) can differ between replicas. A workaround to these issues is to always use an explicit field-based sort. Deleted documents and the Lucene document order do not affect that kind of sort. > 2) What does it mean Custom update chain post-processors may never be > invoked on a recovering replica > <https://lucene.apache.org/solr/guide/7_2/update-request-processors.html> The name of the update chain that was originally used during the indexing is not stored in the transaction log, so when the transaction log is replayed, the update chain is not called. > if all my UpdateProcessors are post-processors (i.e. are after > DistributedUpdateProcessor)? Will all buffered update requests in recovery > be indexed in replica without my features? General advice: In most cases, a post-processor is NOT a good idea. Changes made to the input document by update processors placed *before* DistributedUpdateProcessor will be recorded in the transaction log, and will be identical on all replicas. Because the transaction log DOES have the results of the processor, and all replicas are guaranteed to be the same, this is almost always what you want. Placing an update processor before DistributedUpdateProcessor ensures that it is only run once for every document. If it is placed after DistributedUpdateProcessor, it will execute once for every replica on every document. That can be a big problem if the update processor runs slowly or consumes a lot of memory/CPU resources. Because post-processors run independently on every replica, they can result in different data on each replica. For instance, if you use the UUID processor after DistributedUpdateProcessor, every replica will end up with a different UUID for the same document. Similarly, the timestamp processor can record a different timestamp on every replica for the same document, because each replica might do its indexing at a slightly different time. Timestamps in a Solr index have millisecond precision. If you actually do intend to have different data in a field on different replicas, then you might want a post-processor. But this requirement is VERY rare. Thanks, Shawn