Hello, the background for my question is that one of the requirements for our injection tool is that it should report that a new document has been successfully enrolled to the cluster only if it is available on all replicas. The automated integration test for this feature will submit a document to the cluster and afterwards check if it can be found with an appropriate query (that is why I have configured autoSoftCommit/maxDocs=1).
In this context the question appeared, what happens if the update request returns rf=1 and I submit a query to a cluster with replication factor of two directly after the update (maybe to the replica due to load balancing)? Will the automated integration test fail sometimes and sometimes not? Will I have to wait artificially between the update and the query and if yes, how long? And how can I implement the requirement that our injection tool should report successful only if the document has been replicated to all replicas? Best Regards, Martin Mois >bq: If a timeout between shard leader and replica can >lead to a smaller rf value (because replication has >timed out), is it possible to increase this timeout in the configuration? > >Why do you care? If it timed out, then the follower will >no longer be active and will not serve queries. The Cloud view >should show it in "down", "recovery" or the like. Before it >goes back to the "active" state, it will synchronize from >the leader automatically without you having to do anything and >any docs that were indexed to the leader will be faithfully >reflected on the follower _before_ the recovering >follower serves any new queries. So practically it makes no >difference whether there was an update timeout or not. > >This is feeling a lot like an "XY" problem. You're asking detailed >questions about "X" (in this case timeouts, what rf means and the like) >without telling us what the problem you're concerned about is ("Y"). > >So please back up and tell us what your higher level concern is. >Do you have any evidence of Bad Things Happening? > >And do, please, change your commit intervals to not happen after >doc. That's a Really Bad Practice in Solr. > >Best, >Erick > >On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO) ><martin.m...@morpho.com> wrote: >> Hello, >> >> thank you for the detailed answer. >> >> If a timeout between shard leader and replica can lead to a smaller rf value >> (because >replication has timed out), is it possible to increase this timeout in the >configuration? >> >> Best Regards, >> Martin Mois >> >> Comments inline: >> >> On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO) >> <martin.m...@morpho.com> wrote: >>> Hello, >>> >>> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been >>> created >with >> replicationFactor=2, i.e. I have one replica for each shard. Beyond that I >> am using autoCommit/maxDocs=10000 >> and autoSoftCommits/maxDocs=1 in order to achieve near realtime search >> behavior. >>> >>> As far as I understand from section "Write Side Fault Tolerance" in the >>> documentation >> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance), >I >> cannot enforce that an update gets replicated to all replicas, but I can >> only get the >achieved >> replication factor by requesting the return value rf. >>> >>> My question is now, what exactly does rf=2 mean? Does it only mean that the >>> replica >has >> written the update to its transaction log? Or has the replica also performed >> the soft >commit >> as configured with autoSoftCommits/maxDocs=1? The answer is important for >> me, as if the >update >> would only get written to the transaction log, I could not search for it >> reliable, as >the >> replica may not have added it to the searchable index. >> >> rf=2 means that the update was successfully replicated to and >> acknowledged by two replicas (including the leader). The rf only deals >> with the durability of the update and has no relation to visibility of >> the update to searchers. The auto(soft)commit settings are applied >> asynchronously and do not block an update request. >> >>> >>> My second question is, does rf=1 mean that the update was definitely not >>> successful >on >> the replica or could it also represent a timeout of the replication request >> from the >shard >> leader? If it could also represent a timeout, then there would be a small >> chance that >the >> replication was successfully despite of the timeout. >> >> Well, rf=1 implies that the update was only applied on the leader's >> index + tlog and either replicas weren't available or returned an >> error or the request timed out. So yes, you are right that it can >> represent a timeout and as such there is a chance that the replication >> was indeed successful despite of the timeout. >> >>> >>> Is there a way to retrieve the replication factor for a specific document >>> after the >update >> in order to check if replication was successful in the meantime? >>> >> >> No, there is no way to do that. >> >>> Thanks in advance. >>> >>> Best Regards, >>> Martin Mois >>> # >>> " This e-mail and any attached documents may contain confidential or >>> proprietary >information. >> If you are not the intended recipient, you are notified that any >> dissemination, copying >of >> this e-mail and any attachments thereto or use of their contents by any >> means whatsoever >is >> strictly prohibited. If you have received this e-mail in error, please >> advise the sender >immediately >> and delete this e-mail and all attached documents from your computer system." >>> # >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> # >> " This e-mail and any attached documents may contain confidential or >> proprietary information. >If you are not the intended recipient, you are notified that any >dissemination, copying of >this e-mail and any attachments thereto or use of their contents by any means >whatsoever is >strictly prohibited. If you have received this e-mail in error, please advise >the sender immediately >and delete this e-mail and all attached documents from your computer system." >> # > # " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system." #