Hi Group, Hi Chris, --- We found the Issue and a Workaround --- We traced the problem down to DistributedUpdateProcessor.doLocalCommit() <https://github.com/apache/lucene-solr/blob/3ae820424809baa4e5a85d8bbdb0294cf6fe5b9b/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1081> which is *silently* dropping all commits while the replica is currently inactive and replaying, imeadiatly returns and still reports status=0. This behaviour is inconsistent with the documented behaviour of a commits "waitSeacher" parameter which: "[..] Blocks until a new searcher is opened and registered as the main query searcher, making the changes visible." https://lucene.apache.org/solr/guide/7_7/uploading-data-with-index-handlers.html#xml-update-commands
The issue we have is the "silent" part. If upon recieving a commit request the replica - would either wait to become healthy and and then commit and return, honoring waitSearcher=true (which is what we expected from reading the documentation) - or at least behave consistently the same way as all other UpdateRequests and report back the achieved replication factor with the "rf" response parameter we could easily detect the degraded cluster state in the client and keep re-trying the commit till "rf" matches the number of replicas. We think this is a bug (silently dropping commits even if the client requested "waitForSearcher"), or at least a missing feature (commits beging the only UpdateRequests not reporting the achieved RF), which should be worth a JIRA Ticket. While i personally would prefere not dropping commits we are fine as long as the client can detect this behaviour, e.g. with a "rf" response parameter in CommitResponse. Client Side Workaround ================== Define a custom updateRequestProcessorChain only for commits, bypassing the DistributingUpdateProcessor: <updateRequestProcessorChain name="my-custom-local-force-commit-chain"> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.NoOpDistributingUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> ..and manually send the following commit to each replica of the target collection using the custom chain: UpdateRequest commit = new UpdateRequest(); commit.setAction(ACTION.COMMIT, true, true, softCommit); commit.getParams().set(CommonParams.DISTRIB, false);//we are the aggregator and take care of distribution commit.getParams().set(UpdateParams.UPDATE_CHAIN, "custom-force-commit-chain"); By bypassing the DistributingUpdateProcessor we achieve the desired behavoiur: a successful commit OR we get a detectable error in the client on which we can retry. Nice thing: most errors are allready detected by HttpSolrClient, which automatically retrys on stale state. All thrown together (not includig the retryOnException(()->forceCommitAllReplicas(col,false)) part): private void forceCommitAllReplicas(String collection, boolean softCommit) throws IOException { ZkStateReader zkStateReader= solrClientCache.getCloudSolrClient(zkHost).getZkStateReader(); Set<String> replicaRoutes = Arrays.stream(CloudSolrStream.getSlices(collection, zkStateReader, true)) .flatMap(s->s.getReplicas().stream()) .map(ZkCoreNodeProps::getCoreUrl) .collect(Collectors.toSet()); UpdateRequest commit = new UpdateRequest(); commit.setAction(ACTION.COMMIT, true/*waitFlush*/, true/*waitSeacher*/, softCommit); commit.getParams().set(CommonParams.DISTRIB, false);//we are the aggregator and take care of distribution commit.getParams().set(UpdateParams.UPDATE_CHAIN, "custom-force-commit-chain"); int achievedRF = replicaRoutes.stream() .map(solrClientCache::getHttpSolrClient) .mapToInt(client-> { try { UpdateResponse resp = commit.process(client); return resp.getStatus()==0?1:0;//count achieved replication factor } catch (Exception e) { return 0;//commit did not succeeded on this replica } }) .sum(); if(achievedRF < replicaRoutes.size()){ throw new SolrException(ErrorCode.INVALID_STATE, "Cluster is in degraded state - not all replicas acknowledged the commit." + achievedRF + "/" +replicaRoutes.size()); } } Cheers Michael Am Do., 6. Feb. 2020 um 12:18 Uhr schrieb Michael Frank < frank.michael.busin...@gmail.com>: > Hi Chris, > thank you for your detailed answer! > > We are aware that Solr Cloud is eventually consistent and in our > application that's fine in most cases. > However, what is really important for us is that we get a "Read Your > Writes" for a clear point in time - which in our understand should be after > hard commits with waitSearcher=true return sucessfull from all replicas. Is > that correct? > The client that indexes new documents performs a hard commit with > waitSearcher=true and after that was successful, we expect the documents to > be visible on all Replicas. > This seems to work as expected if the cluster is in a healthy state. > If we shut down nodes while updating documents and committing we observe > that commits somehow get lost. > The documents are neither visible on the leader nor on any replica! Even > after all nodes and replicas are up again. > And we don't get any error or exception from the Solrj client. > Is there any way to make sure that a commit is executed sucessfully on > _every_ replica (and fail if the replica is currently down or recovering)? > Or to get notified that the commit could not be executed because the > cluster is in an unhealthy state? > If we can confirm and verify this in our Indexing client, we could detect > failures and recover. > > I don't think the /get request handler is not an option for us because it > only accepts document IDs and no search queries, which we rely heavily on. > Is that correct? > > > : FYI: three is no need to send a softCommit after a hardCommit > Agreed, that was just us experimenting and trying stuff. > > : So to be clear: 'rf=2' means a total of 2 replicas confirmed the update > -- that includes the leader replica. 'rf=1' means the leader accepted the > doc, but all other replicas are down. > : if you wnat to me 100% certain that every replica recieved the update, > then you should be confirming rf=3 > Agreed, should have been more clear. We have multiple test scenarios. Some > with 2 replicas (1 leader 1 reps) and some with 3 (1 leader, 2 reps). In > the first mail i just picked the simplest test setup that failed, > consisting of one leader and one replica - so technically we could > reproduce the error in a two node cluster. > > Cheers, > Michael > > Am Do., 6. Feb. 2020 um 01:42 Uhr schrieb Chris Hostetter < > hossman_luc...@fucit.org>: > >> >> I may be missunderstanding something in your setup, and/or I may be >> miss-remembering things about Solr, but I think the behavior you are >> seeing is because *search* in solr is "eventually consistent" -- while >> "RTG" (ie: using the /get" handler) is (IIRC) "strongly consistent" >> >> ie: there's a reason it's called "Near Real Time Searching" and "NRT >> Replica" ... not "RT Replica" >> >> When you kill a node hosting a replica, then send an update which a >> leader >> accepts but can't send to that replica, that replica is now "out of sync" >> and will continue to be out of sync when it comes back online and starts >> responding to search requests as it recovers from the leader/tlog -- >> eventually the search will have consistent results across all replicas, >> but during the recovery period this isn't garunteed. >> >> If however you use the /get request handler, then it (again, IIRC) >> consults the tlog for the latest version of the doc even if it's >> mid-recovery and the index itself isn't yet up to date. >> >> So for the purposes of testing solr as a "strongly consistent" document >> store, using /get?id=foo to check the "current" data in the document is >> more appropriate then /select?q=id:foo >> >> Some more info here... >> >> https://lucene.apache.org/solr/guide/8_4/solrcloud-resilience.html >> https://lucene.apache.org/solr/guide/8_4/realtime-get.html >> >> >> A few other things that jumped out at me in your email that seemed weird >> or worthy of comment... >> >> : Accordung to solrs documentation, a commit with openSearcher=true and >> : waitSearcher=true and waitFlush=true only returns once everything is >> : presisted AND the new searcher is visible. >> : >> : To me this sounds like that any subsequent request after a successful >> : commit MUST hit the new searcher and is guaranteed to see the commit >> : changes, regardless of node failures or restarts. >> >> that is true for *single* node solr, or a "heathy" cluster but as i >> mentioned if a node is down when the "commit" happens it won't have the >> document yet -- nor is it alive to process the commit. the document >> update -- and the commit -- are in the tlog that still needs to replay >> when the replica comes back online >> >> : - A test-collection with 1 Shard and 2 NRT Replicas. >> >> I'm guessing since you said you were using 3 nodes, that what you >> mean here is a single shard with a total of 3 replicas which are all NRT >> -- remember the "leader" is still itself an NRT replica. >> >> (i know, i know ... i hate the terminology) >> >> This is a really important point to clarify in your testing because of >> how >> you are using 'rf' ... seeing exactly how you create your collection is >> important to make sure we're talking about the same thing. >> >> : Each "transaction" adds, modifys and deletes documents and we ensure >> that >> : each response has a "rf=2" (achieved replication factor=2) attribute. >> >> So to be clear: 'rf=2' means a total of 2 replicas confirmed the update >> -- >> that includes the leader replica. 'rf=1' means the leader accepted the >> doc, but all other replicas are down. >> >> if you wnat to me 100% certain that every replica recieved the update, >> then you should be confirming rf=3 >> >> : After a "transaction" was performed without errors we send first a >> : hardCommit and then a softCommit, both with waitFlush=true, >> : waitSearcher=true and ensure they both return without errors. >> >> FYI: three is no need to send a softCommit after a hardCommit -- a hard >> commit with openSearcher=true (the default) is a super-set of a soft >> commit. >> >> >> >> -Hoss >> http://www.lucidworks.com/ >> >